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It is a matter of common knowledge that a mind which for any 
eason becomes engaged in an activity and finds itself repeatedly and 
persistently failing therein, is impelled to intermit or abandonit, The 
person does abandon it unless this impulsion is counterbalanced by 
ome contrary force, such as the hope of a turn of the tide toward 

ccess, or an inner sense of worth from maintaining the activity, or a 
ear that worse will befall him if he stops. 

It seems probable that during a long unrewarded activity when 
bandonment or overt intermission is not permissible, the mind will 
end to relax its efforts and do work less in quantity, or lower in 
juality, or inferior in both. And we have sought to discover whether 
his is true, and, in particular, whether it is beyond ordinary voluntary 
ontrol. The experiments to this end are all of the following nature: 
he subject is required to do his best with a long series of tasks (three 
undred sixty or more) upon each of which he is to spend five seconds 
or seven and one-half or ten seconds, in some of the experiments). 

ach series consists of (A) groups of five consecutive tasks, each 
roup such that the subject will certainly do one and will average two 
br three, and (B) groups of fifty or one hundred consecutive tasks of 
hree sorts. In Bl none or only a very small percentage of the tasks 
an be accomplished within the time allowed. In B2 the percentage 
Mf the fifty or one hundred that can be accomplished is larger. In B3 
he percentage is much larger. 


1 The investigations reported in this article were made possible by a grant from 
he Carnegie Corporation. 
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We compare the work done at an A group of five tasks following a 
B group of fifty or one hundred rich in successes with that done at an 
A group of five tasks following a B group of fifty or one hundred poor 
in successes, or almost entirely devoid of such. The subjects were 
college undergraduates and educated adults. The tasks were (I, 
rhymes) writing words to rhyme with given words, (II, completions) 
supplying letters to complete words, (III, anagrams) writing words 
made from given letters, (IV, opposites) writing the opposites of given 
words, and (V, equations) making true equations from given numbers 
and signs. The time per task was five seconds for I and II, seven and 
one-half seconds for III and IV, and ten seconds for V. Samples 
of the series with fewest chances for success, and of the instructions 
given to the subjects, are shown below. Lack of space prevents 
showing samples of the sets with a moderate number and with many 
chances for success. 


Samp.es oF B 11-100 ror Ruymegs I, Comptetions I, anp ANaGRaMs I, Tuat Is, 
FOR THE SETS WITH Fewest CHANCES FOR SUCCESSES 


17. almost_...... re ee 97. euphony ...... 
18. animate ...... 58. capsule... 98. exodus ....... 
Se eee ee 59. caramel... Se ee. = weewes 
20. archive ...... 60. carpet _......... 100. fetid ....... 
ee 61. cathedral ...... tare 1 Ar 
23. gubem ti... 62. caustic  ...... 102. gelatin .......... 
23. balsam... eee 103. general ...... 
eee 64. cribbage ...... 104. hackney ...... 
17. botr ... 57. fand .... 97... uda. 
ab.6.¢.@... MI... 98. . ole. u.e 
ai. 8 OS. F 59. fi.ri.o.e. 99..h.mele.. 
20. ca.a.al..e 60. ite .. 100..l.e.a.e 
ee oe re oe ee ese. 4. 28 54385 2.1. 
im .4:. asi. 62.la.y.g.a. 102. .ar..na. 

23. en .eo.. 63. me.di.a.. 103. . etap .o. 
Me os ss 64. men.a.e. 104...o0nec.u. 

i eee pear 97. adeilluvve ...... 
18. acehrtty ...... 58. aceilmos ...... 98. adecmnuei........ 
Pere 59. aehprty ...... 99. acinsuv_ii....... 
20. aaceilmn ...... 60. aaciilrv ...... 100. aignwwx s......... 
21. agikinw............ 61. acekw _si....... 101. aasaghnty .......... 
22. hmstuy ....... 62. ioorssuv....... 102. adiimo __....... 
23. ihimtuu_....... ewes 103. aceilnuvz...... 
7 ee. wees 64. aegilruvz ...... 104. ennox __.......... 
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SampLes oF THe Harvest B Ser or OppositTss 


16 emt eae ee ee ere 
17. GE wees eres 47. chariatam i... . ww wes 
i a bee ceaens a ae Ange 
10. qramulag kjk cee Se Se lee 
TR A ane 50. dichotomous .......... 
Sh GE. a ead eas 51. subjunctive .......... 
, ee es 52. caparisoned .......... 
, PR en er AST) ow a wana 
24. investiture .......... OS eee 
Ti Ge cea ewces RR Nt, Gi 

SampLEs oF A Harp B Set or Disarrancep EquaTIONs 
161. % .7% 2 3 5 | SR, RA AERP SUR RU Mp SENG ce SEE RS Ae 
162. 2 2 3 4 Je fe ote ae ee a ak nae Deri are a 
163. .2 6 1.6 4 5 ee ee es eo. OU ee eee 
164. % le 1 K 1% ee ae BE PRES oo in oh ed Se oda Rae 
165. 1 1 3 4 ec ee ee a ae aan Seem ee eee 
166. 2 4 4 4 a ae ee a Ce Pe yar) Sr 
_. s 1 2 S.C cee 5 odkele ob homered 
168. .2 2 4 8 So ee Ot SI Os a big anes balb.o we enean 
169. 3 3 4 5 2S See Se Sa OE ens vue cudin wae 
17.4% %* K Ke & a. ae Mee Wee bey act ek aensaaen 


In Experiment a, twenty-five educated adults had Rhymes FA 
1-5, FA 6-10, FB 11-100, FA 111-115, FA 116-120/ MA 1-5, MA 
6-10, MB 11-110, MA 111-115, MA 116-120/ JA 1-5, JA 6-10, 
IB 11-110, JA 111-115 and JA 116-120/in that order with rests of 
five seconds after each one hundred twenty. F, M, and I refer to 
sets of one hundred twenty in which tasks 11-110 contained frequent 
opportunities for success, fewer, and very few, respectively. Then 
after an intermission of five minutes used to collect their records 
and give the material and instructions for the completion tasks they 
had Completions JA 1-5, JA 6-10, 7B 11-110, JA 111-115, JA 
116-120, FA 1-5, FA 6-10, FB 11-110, FA 111-115, FA 116-120, 
MA 1-5, MA 6-10, MB 11-110, MA 111-115 and MA 116-120. 
Then after an intermission of five minutes used to collect these records 
and to give the instructions for the tasks with anagrams, they had 
Anagrams MA 1-5, MA 6-10, MB 11-110, MA 111-115, MA 116-120, 
FA 1-5, FA 6-10, FB 11-110, FA 111-115, FA 116-120, JA 1-5, IA 
61-0, JB 11-110, JA 111-115, JA 116-120. The results appear in 
Table I. 

In Experiment b, twenty-one college undergraduates (all girls) had 
the same tasks as the group in Experiment 2, but in this order: Rhymes, 
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M, I, F; Completions, F, M, I; Anagrams F, I, M. The results 
appear in Table II. 

In Experiments C1, C2, C3, C4, and V5, each of forty-four college 
undergraduates had the three sets of Anagrams, with various shifts 
of the first and last tens, so as to eliminate or allow for any differences 
in difficulty in these. 

After allowance for the differences in difficulty Experiment C gives 
losses from 6-10 to 111-115 as follows: 

Cland C2. InF,.75 In M, —1.48 InTJ, 1.46 

C3,4, and 5. InF,55 In M, —88 In J, 1.51. 

The numbers correct out of a hundred were about thirty, twenty 
and ten for rhymes, thirty, twenty and ten for completions, and 
fifteen, ten and one for the anagrams or disarranged letters. But the 
subjects may have considered some of their wrong responses as 


TaBLE I.—CHANGES IN THE EFFICIENCY OF WorRK AFTER PERIODS wiTH Many, 
FEWER, AND. VERY Few Rewarps. EXPERIMENT A. TWENTY-FIVE 
EpucaTep ADULTS 


F = Frequent; M = Fewer; J = Very Few 
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1 2 3 4 5 6 7 8 9 
Rhymes F...........| 2.44, 3.24] 27.6) 46.0) 2.48] 3.12 .76 09 . 67 
Completions F.......| 1.92} 1.48) 27.1] 42.8) 3.44) 1.40)—1.96|—1.34) —.62 
Anagrams F.........| 2.56} 0.64) 13.4) 18.5) 1.04) 1.28;— .40|/— .33) —.07 
Rhymes M..........| 3.20} 2.92) 18.2} 42.5) 3.04) 3.76)— .12 .00} —.12 
Completions M......| 3.32] 1.48) 17.8) 34.3) 2.20) 1.82;— .72;)— .44) —.28 
Anagrams M........| 1.20} 0.32} 5.9) 9.4) 1.08) 2.36)— .76;— .22) —.52 
Rhymes J...........| 2.40} 3.20) 8.3) 31.1) 3.04) 2.68 .16 .00 16 
Completions J.......| 3.00} 2.20} 8.0) 13.9) 2.16) 1.28 .04 .45|) —.4l 
Anagrams /......... 2.16) 1.08; 0.9) 2.8) 0.24) 1.60 .84 44 40 
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successes. ‘The numbers of responses of any sort were about forty-five, 
forty and thirty for rhymes, forty, thirty and twenty for completions, 
and eighteen, twelve, and two for anagrams. 

The approximate relative difficulty of the various A groups of five 
tasks was determined from nine subjects who had the separate tasks 
of all the groups of any one sort in an order roughly equalized for 
each group in a control test with no B tasks. There are differences 
for which allowance is made in Tables I and II. 


TaBLe II.—CHANGES IN THE ErFicreNcy or WorK AFTER Pseriops wiTtH Many, 
FEWER, AND Very Few Rewarps. Expertment b. TWENTY-ONE 
CoLtLeGE UNDERGRADUATES 
F = Frequent; M = Fewer; IJ = Very Few 
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Ba lRO (RAZR RIES 5 168 8 
1 2 3 4 5 6 7 8 9 

Rhymes F........... 3.71| 3.67 te 3.62) 3.43 .05 .09|}— .04 
Completions F....... 3.24) 1.62) 32.3) 38.5) 3.38) 1.67/—1.76|—1.34;— .42 
Anagrams F......... 2.76) 1.19) 17.4 21.1) 1.48) 2.52i\— .29'— .33 .04 
Rhymes M.......... 4.19) 3.90 ad po 4.24 1.00 .00; 1.00 
Completions M...... 3.19) 2.48) 23.5) 31.8) 3.05) 1.86;— .57|\— .44;— .13 
Anagrams M........ 2.38) 1.05) 12.0) 14.1) 2.76) 3.38|/—1.71;— .22)/—1.49 
Rhymes 7........... 2.71) 3.38 en 3.10 3.29 .28 .00 .28 
Completions J oe ee 3.81) 3.00) 10.6) 16.5) 2.76) 2.24 24 .45)— .21 
Anagrams J......... 2.62) 2.43) 1.3) 2.6) 0.43) 2.19) 2.00 .44, 1.56 





























The best comparison is between the five tasks just before and the 
five just after the hundred. In the very first five the subject may be 
getting under way and by the very last five he may have lost the 
effect of rarity of successes. 

Allowing for the differences in difficulty of the early and late 
tasks the decrease in the number correct in the first five after the 
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hundred below that in the last. five before the hundred in experiments 
A, B and C, averages .42 for the J condition and only .16 for the F 
condition. The difference .26 is about one eighth of the early or 
late score and has a probable error of .12. The tasks done after a 
medium frequency of successes showed an increased efficiency. 

In Experiments A, B, and C there is then only a small and very 
doubtful influence of the relative frequency of successes and frustra- 
tions, as modified by the subjects’ voluntary control. 

In Experiments A, B, and C, the arrangement of the three sets of 
one hundred twenty each on separate sheets with separate numbering 
could suggest to the subject who found the last ten (111-120) fairly 
easy on the first sheet that the last ten on the next sheet would be 
easy also. When this expectation was verified he might well expect 
the same for the last ten on every sheet thereafter. His attitude 
might then shift markedly when he reached tasks 111, 112, ete. 

In Experiment D the numbering was consecutive from one to 
three hundred, the test groups of tasks being Nos. 1-5, 6-10, 61-65, 
116-120, 121-125, 176-180, 231-235, 236-240, 291-295, and 296-300. 
The task was to give a single word meaning the opposite of the given 
word. The formation of opposites by prefixes such as un, in, im, il 
and non, was not allowed. 

Thirty-eight college girls worked for thirty-seven and one-half 
minutes at writing the opposites, being told to progress to number 
two, number three, number four, and so on, every seven and one-half 
seconds. There is no evidence whatever of a decline in efficiency as a 
consequence of six minutes of work with relatively infrequent rewards. 

In Experiment E, eight educated adults were required to attempt 
two hundred fifty disarranged equations, being allowed ten seconds 
for each. Nos. 11-60 were so hard that only thirteen were done 
(less than one in twenty-five per person) and only three were right. 
Nos. 136-185 and 191-240 contained a somewhat larger number 
of easy tasks, fourteen and twenty-three being done, of which seven 
and sixteen were correct. Nos. 66-115 contained many easy tasks, 
and fifty-three were done, forty-two correctly. 

The relative difficulty of tasks 1-5, 6-10, 61-65, 116-120, 121-125, 
126-130, 131-136, 186-190, 241-245, and 246-250 (in the sense of the 
frequency of correct solutions in ten seconds) was determined by a 
special test with them alone in the case of twenty subjects (educated 
adults). 


1 We count the results of Experiments A, B and C as of equal weight. 
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The eight subjects showed no appreciable reduction in achievement 
after the hard tasks. After allowing for differences in difficulty, the 
difference in favor of the ten following the fifty easy tasks (66-115) 
over tasks 60-65, 186-190, and 241-250, was almost zero (.06 tasks 
per person). If only the first five following each set of fifty tasks are 
used, the achievement was actually better after tasks 11-60, 136-185 
and 191-240 than after 66-115. 

For Experiment F’, two long series, one containing many easy tasks, 
one containing very few, were constructed, using completions (words 
with missing letters), anagrams (words with disarranged letters), and 
equations. The former consisted of thirty tasks of varied difficulty 
(1-10 being equations, 11-20 being completions, and 21-30 anagrams), 
one hundred twenty completions, one hundred twenty anagrams, 
sixty-five equations, eight completions and eight anagrams. The 
latter consisted, in order, of thirty tasks of varied difficulty, one 
hundred hard completions, two easier completions, one hundred hard 
anagrams, one easier anagram, one hundred hard equations, two 
easier equations, fifty hard equations, ten equations of varied difficulty, 
ten completions of varied difficulty, and ten anagrams of varied 
difficulty. 

In this hard series sixty educated adult subjects worked for fifty 
minutes with very few successes.1 We compare their achievements 
at the two easier completions, one easier anagram, and two easier 
equations which came as oases in a desert of difficulty with their 
achievements at comparable tasks in the easy series. We also com- 
pare their achievements in fourteen of the thirty tasks at the end of 
the hard series with their achievements in fourteen of equal difficulty 
occurring at the end of the easy series. The facts appear in Table III. 
In the inserted tasks there is no observable difference. In the final 
tasks, they do somewhat better after the easy than after the hard 
series, scoring four hundred sixty-five correct in place of three hunderd 
fifty-seven. 

In Experiment G thirty educated adults or adolescents over 18.0 
worked with a long easy and long hard series much like those of 





1In the case of a random half of the sixty subjects, the median number of 
correct responses out of the three hundred fifty not in the tests was fourteen. The 
average number was sixteen. There were very few wrong responses which the 
person in question might have regarded as right, almost none in the anagrams and 
equations. A liberal alowance for these would leave the median number of actual 
or supposed rights under one in fifteen. 
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Experiment F. The tasks were, however, dictated and the subject 
had to copy each (except for the opposites) before writing the solution. 
Consequently fewer tasks were used and more time per task (ten 
seconds for a completion, twelve seconds for a word with disarranged 
letters, and fifteen seconds for a disarranged equation). Also both 
series had six test completion tasks, five test disarranged-word tasks 


TasLe III.—Frequency or Successes with Tasks or Equat DirFFicuity 
INSERTED IN OR AT THE END or a Harp Serigs (OnE HunpDRED 
ComPLeTions, ONE HunpRED ANAGRAMS AND ONE HUNDRED 
Firry Equations), oR oF Any Easy Series (OnE Houn- 

DRED TWENTY CoMPLETIONS, ONE HuNDRED TWENTY 
ANAGRAMS AND Firty-FivE Equations). TWENTY- 

EIGHT Epucatep Aputts (A), AND THIRTY-TWO 
Epvucatep Aputts (B) 
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Inserted Tasks 
{ 
i In hard series A| B In easy series A| B 
Hoi cast . . 21| 28 adroi . 26| 30 
moa . tho. 8} 10 . O1SO . 15} 15 
Le 
ig adert 17| 17 celos 3} 8 
yay Me % % - = 22| 16 %K%X = 16| 12 
ea 4520 xX = 21| 27 248xX = 28) 27 
Sum 89} 98 187 88} 92 180 
Final Tasks 
3412 xX = 27 29 235 = + 28 29 
2567+ = 23 27 7963 + = 23 28 
%%%X = 18 11 121013 — — = 22 19 
23921 — += 12 11 25660 X X = 26 22 
25678 + X X = 1 1 %5%13xX X = 8 2 
a... ? 17 23 . . tion 25 25 
a 15 12 es ass 24 20 
. ugh 13 11 no.s.s 14 14 
. int 9 8 m S+8 . 8 7 
ae oe Mee . . nic 4:8 
fu.u. i. 2 eae Bee : % 
eimpr 13 17 ekops 17 15 
kloy 11 14 elrru 13 21 
akprs 6 11 eory 22 17 
173 184 357 238 227 465 


al 
th 


ol 
OI 
in 
sk 
in 
sh 
at 


al 


Sepa 





5 





Influence of Successes and Frustrations 249 


and five test equations inserted at the end of about eighteen, thirty- 
three, and fifty-two minutes in the course of the work period. The 
work-period was sixty-four minutes for the easy, and sixty-three and 
one-half minutes for the hard, series. Successes occurred in about 
one out of four tasks in the easy series, and in about one out of twenty 
in the hard series. The sixteen inserted tasks in the hard series 
showed 94 per cent as many successes as the sixteen of equal difficulty 
inserted in the easy series. The sixteen at the end of the hard series 
showed 95 per cent as many successes as the sixteen of equal difficulty 
at the end of the easy series. These results cannot be taken quite at 
their face value because, in trials five days earlier with the easy series 
and four days earlier with the hard series, the easier series was run 
with times of seven and one-half, ten, and twelve seconds, whereas the 
hard was run as stated above. The memory advantage for the hard 
series was thus somewhat greater for the hard series. But a generous 
allowance for this would still leave the percentages near ninety. 

On the whole, it appears that the influence of infrequency of success 
in work periods up to fifty minutes is largely subject to voluntary 
control through the summoning of contrary forces, in the case of 
educated adults. 

There is little or no fundamental depression of the intellectual 
apparatus by work with infrequent success for which a suitable 
voluntary control cannot compensate. Frequent frustration causes 
irritation and loss of interest, but not any large loss in power to attend 
to the frustrating tasks. 

No sensible person would plan educational work so that a pupil 
would fail ten or twenty times for each success. But the harm done 
by such work is not as great as it might be. In these college students 
and adults, intellect is resilient and, within limits, will operate when, 
as, and if it has a chance. It can be kept alert by external motives 
even though it suffers frustration in a very high percentage of its 
efforts. 


APPENDIX 


THE INFLUENCE OF THE RELATIVE FREQUENCY OF SUCCESS 
AND FRUSTRATIONS UPON LEARNING 


Experiment G provides evidence concerning the impairment in 
learning, 7.e., profit from the experience, due to a very high versus a 
moderate proportion of frustrations. We use as material, first the 
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records with five and six tasks inserted at three points in a “hard” 
series containing on the average only one success in twenty and those 
of five, five and six tasks of equal difficulty inserted at three points 
in an “easy” series containing on the average one successinfour. We 
observe, in the case of each task, whether it was done correctly in the 
first trial with the series. If it was not, no more attention is paid to 
it, but if it was, we observe whether it was also done correctly in a 
second trial which occurred four days later in the case of the “‘hard’’ 
series and five days later in the case of the ‘‘easy” series. The 
successes in the first trial of the ‘“‘hard” series showed 84.4 per cent 
of successes in the second. The successes in the first trial of the 
“‘easy”’ series showed 85.8 per cent. The advantage for the easy 
series would probably have been a little greater if the time interval 
had been four days for it, and if certain minor disturbing factors 
which operated on the first trial of the easy series had been avoided. 
Possibly the 85.5 might have been as high as 90. 

Next we make a similar comparison, but using the records for 
sixteen tasks located at the end of the ‘‘hard”’ series and sixteen at the 
end of the ‘‘easy” series. The percentages for the repetitions of 
success are 76.2 (hard) and 81.2 (easy). For the reasons just given 
the 81.2 is probably a bit too low, and might, with perfect equality of 
conditions other than the frequency of successes during the work 
period, have been as high as 85. 

There is thus some impairment of the ability to learn when the 
mind is suffering, and has been suffering, very frequent frustrations. 
We can determine roughly how much by finding the corresponding 
percentages of repetition for tasks done correctly in thirty of varying 
difficulty given at the beginning of the ‘‘hard” and thirty of varying 
difficulty given at the beginning of the ‘‘easy”’ series. These were 
89.5 and 80.7 respectively. In the hard series, then, the inserted 
and final tasks were learned about nine tenths as often as the beginning 
tasks. In the easy series, the inserted and final tasks were learned 
about three per cent oftener than the beginning tasks. Learning is 
reduced by about one eighth by the greater frequency of frustrations. 

This result should be checked by further experiments, though the 
time-cost of making and scoring such is considerable, and the probabil- 
ity is that the influence upon learning will be of the same general nature 
and degree as the influence upon achievement. 
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AN EVALUATION OF FIVE-, TEN-, AND FIFTEEN-ITEM 
REARRANGEMENT TESTS 


VERNER MARTIN SIMS 


University of Alabama 


The rearrangement or continuity test is more or less unique among 
the types of objective tests in that it is concerned with the relations 
which exist among a series of items rather than with the items them- 
selves. Since objective tests are so often criticised because they deal 
too much with isolated elements, such a test should fill a valuable 
place in the testing program. Odell,® Land,? Ruch,*® and Cureton and 
Dunlap’ suggest the wide applicability of the test. The writer is 
inclined to believe that in practically every subject taught in high 
school and college there is, or should be, concern over relating items 
to some designated basis. Wherever such relationships have been 
taught adequate measurement necessitates testing of the learning, 
and, providing it can be shown to give good measurement, the rear- 
rangement test should be found useful. However, because the test 
differs radically from the conventional objective tests, one cannot 
assume without experimental evidence the extent to which it meets 
the commonly accepted standards of a good test. | 

It was the general purpose of this study to evaluate the measuring 


‘qualities of the rearrangement test; and, in particular, to attempt to 


determine the relation between length of set and satisfactoriness of 
measurement. Obviously it was not practical to evaluate all possible 
lengths. The study was limited to this: To what extent do five-item, 
ten-item, and fifteen-item rearrangement tests meet certain criteria of 
a good test? ' 

Ninety items, center heads from eight chapters in Dashiell’s 
Fundamentals of Objective Psychology, arranged in the order in which 
they had been presented in the course, were divided into three equiva- 
lent lists of thirty items each by placing every third item in a particular 
list. From each of these lists a rearrangement test was constructed. 
The test made from the first list consisted of six sets of five items (the 
five-item test), that from the second consisted of three sets of ten 
items (the ten-item test), and that from the third list consisted of two 
sets of fifteen items (the fifteen-item test). The items for the five- 
item sets were selected by taking every sixth item from the list, those 
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for the ten-item sets by taking every third item from the list, and those 
for the fifteen-item sets by taking every second item. Thus each 
set sampled the entire range of the materials. The items within a set 
were presented in a purely random order, the test being to arrange 
them in the order in which they had been presented in the 
course. 

The three tests were given to one hundred forty-two students who 
had just completed the unit of work and had prepared for a review 
test. Each student kept the time necessary for taking each of the 
tests. From these data a comparison of the five-, ten-, and fifteen- 
item tests was made. 

The scoring of the rearrangement test has been a much controverted 
subject.* Theoretically, the correlation method seems to be most 
satisfactory; but the time necessary to run the correlations is great, 
and, since the number of items used in such tests must of necessity be 
limited, the coefficients obtained can be nothing more than rough 
approximations to the true relationship. Further, Odell shows that a 
method suggested in more or less the same form by Wilson, by Sangren 
and Woody, and by Odell (where the score is the difference between 
the sum of the maximum possible deviations, and the sum of the 
individual’s deviations, from the true order) correlates highly with 
both rho and the “foot-rule” methods of correlation. We have 
assumed that this method is valid and from it derived a formula which 
for the purposes of this study, at least, is more satisfactory. (Inci- 
dentally, whenever teachers determine grades by accumulating raw 
scores from various types of tests, the method of scoring outlined 
below will probably be found more satisfactory than other methods 
proposed because it does not give added weight to the rearrangement 
test as the other methods do.) 

For our immediate purposes, the most serious-objection to the 
method is the impossibility of comparing scores on tests of different 
length sets, this because the maximum possible score is determined 
by the length of set used rather than by the number of items. To 
illustrate, using this method of scoring one could not compare scores 
on thirty items presented in sets of five with thirty items presented 
in sets of fifteen, because the maximum possible score on the six 
five-item sets would be seventy-two points (six times twelve) while 
on the two fifteen-item sets it would be two hundred twenty-four 
(two times one hundred twelve). 
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The relationship which exists between the number of items in a 
set and the maximum possible deviation suggested a solution. The 
method of scoring under consideration may be expressed by the 
formula 


S = =D — xd (1) 


when =D is the maximum possible deviation, and 2d is the individual’s 
deviation, from the true order. When the number of items to be 
rearranged (NV) is even, 


zD = —. (2) 


Consequently, formula (1) was transformed by substituting the value 
N?*/2 for ZD, and dividing by the ratio of ZD to N. 

Simplified, this becomes 

. 22d 
S’=N-— N- (3) 

and the maximum possible score is N rather than 2D, while the worst 
possible score is still zero. 

When the number of items is odd 


diene 3 


=D = —; 3 (4) 





and the same procedure reduces to 


2=dN 
into Ps 4 (5) 


This form is not simple enough to be practical. However, inspection 
shows that formula (3) exceeds it by =d/N =D, a function that will 
never exceed 1/N score points (the value when the student makes the 
worst possible arrangement). This is a negligible quantity if scores 
are to be expressed as whole numbers; consequently, for all practical 
purposes, formula (3) holds whether N is odd or even. 

Since for a given length set the 2d is the only variable, it was a 
simple matter to prepare a table showing scores for all possible sums of 
deviation (2d) for the three different length sets. The score for a set 
could then be found by summating the deviations from the true order 
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* Corrected for attenuation. 


and reading from the table. 
The total score for a test was the 
sum of the scores on the sets. 
By means of this procedure the 
scores for each student on the 
three tests was determined. 

The three tests were evaluated 
by comparing (1) the time neces- 
sary to take, (2) the time neces- 
sary to score, (3) the mean and 
sigma of the score, (4) the 
reliability for equivalent lengths 
and for equivalent testing times, 
(5) the intercorrelations, and (6) 
the correlation with scores on 
other tests over the same material 
and with intelligence. Table I 
presents in summary form the 
findings for each of the points. 

As one would expect, the time 
necessary to arrange the items 
increases with the length of the 
set (columns 1 and 2), the fifteen- 
item test requiring approximately 
one-third more time than the five- 
item test. But, when one com- 
pares the number of items done 
per minute with that necessary 
for other objective tests, it is seen 
that it is not abnormally large. 
For example, Ruch* recommends 
three or four recall items and 
from two to six recognition items 
per minute. The time for scoring 
a test was the time used to find 
the sum of the deviations for the 
sets, but did not include the time 
necessary to read the scores 
for the sets from the table and 
to total the score for the test. 
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Two readers were used, both of whom were skilled in scoring objective 
tests but neither of whom had scored rearrangement tests before. The 
amount of time needed for scoring the different tests is seen to vary 
greatly (column 3). The actual scoring, that is, checking the differ- 
ence between the student’s arrangement and the key, did not take 
longer but the adding of the deviations for the longer sets was time 
consuming. 

Assuming that difficulty is measured by the mean and sigma, there 
is no difference in the difficulty of the three lengths of sets (columns 
4 and 5). The differences between the means are negligible and the 
differences between the ranges, as shown by the sigmas, are not large 
enough to be considered significant. 

The coefficients of reliability were determined by means of the 
Spearman-Brown formula and were stepped up to equivalent lengths 
(column 6) and to equivalent testing times (column 7). Reliability 
seems to be positively related to length of set. The difference between 
the five-item and the ten-item tests is not great but whether we 
consider equal lengths or equal testing times, the fifteen-item test is 
more reliable than either of the others. The reliability for all three 
types is satisfactorily high. For example, if the five-item test was 
stepped up to one hundred items the coefficient would be .90, which 
compares favorably with other objective types. However, since the 
number of items to be arranged is usually somewhat limited, one 
should perhaps stress the fact that reliability is directly related to 
length and scores on a very short rearrangement test are to be taken 
no more seriously than are scores on any other short recognition test. 
This, obviously, was the error in Worcester’s argument when he 
concluded after a detailed study of a single nine-item set, “ . . . that 
the only fair method of procedure is not to use this type of test . . . ”’.” 

Only indirect evidence of what the test measures was available. 
The internal consistency of the measurement is high. When corrected 
for attenuation due to unreliability, the three tests seem to measure 
practically the same thing (column 8). What is being measured is 
another matter. The test was not originally intended as a measure 
of achievement in the course, the material being chosen simply 
because it was convenient for the purposes of the experiment, but in 
scoring the tests a relationship between scores on this test and other 
measures of achievement was noticed. This relationship along with 
the correlation with intelligence is presented for what it may be worth. 
The regular testing program over the materials from which the test 
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was built consisted of four short objective tests and an objective review 
test, in all approximately two hundred fifty items of true-false, 
multiple choice, and completion types. The intelligence score was 
from the Otis Self-administering Test of Mental Ability. The coeffi- 
cients of correlation, corrected for attenuation due to unreliability, 
between these two measures and the three rearrangement tests are 
presented in columns 9 and 10. Whatever the tests measure is 
positively related to both intelligence and achievement. The correla- 
tion. between the three tests and achievement is approximately the 
same, but for some unexplained reason the ten-item test shows less 
correlation with intelligence than the other two. 

To summarize then, a rearrangement test when scored by the 
method proposed, which method has the advantage that scores are 
expressed in terms of the number of items, compares favorably with 
other types of objective tests as to reliability, time for taking, and time 
for scoring. The reliability of the test is increased by increasing the 
the length of the set, but the time for taking and the time for scoring 
is also increased. The difficulty of the test, as measured by average 
scores made, is not related to length of set used. Different length 
sets consistently measured the same thing, and the particular tests 
used here correlated positively but not too highly with other measures 
of achievement and with intelligence. It seems safe to conclude that 
wherever the desire is to measure ability to relate items to some 
designated basis the rearrangement test will give satisfactory measure- 
ment. Perhaps we might add that when the number of available 
items is limited, the longer the set the more satisfactory the measure- 
ment. This last conclusion should be limited to college students. It 
may well be that for less mature students the long set becomes more 
and more a measure of intelligence. - 
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THE DEVELOPMENT OF READING AND STUDY 
HABITS IN COLLEGE STUDENTS! 


ROY W. DEAL 
Nebraska Wesleyan University 
PREVIOUS WORK 


During the past few years institutions of secondary and higher 
education have placed much emphasis upon study difficulties and the 
causes of failures among the students. Many investigations have been 
made. Reporis of these studies have appeared in educational and 
psychological journals, in new books, in addresses at teachers meet- 
ings, and elsewhere. One of the major findings has been the lack of 
reading efficiency and its resultant handicaps among advanced 
students. The diagnosis of this difficulty and suggested solutions 
have been of major importance in educational literature of the past 
decade. 

College students are failing and have failed in large numbers due 
to their inability to read. While the obligation is upon the elementary 
school for this ability, the colleges are forced to contend with the 
problem when the student arrives with a reading handicap. 

Much of the experimental work on college failures in the past has 
been done in the field of reading efficiency and study habits. These 
investigations are so familiar to students of college problems that a 
review of them here is unnecessary. However, very few studies 
show any long time systematic attempt to remedy these situations. 


THE PROBLEM OF THIS STUDY 


The study reported here attempts to answer and overcome some 
of the criticisms aimed at most of these earlier investigations. In 
the first place a very definite diagnosis of the failing student was 
employed. Second, a systematic teaching procedure was employed 
to correct the difficulties. Emphasis was placed almost entirely upon 
comprehension. The rate of reading was scored but no effort was 
made to control or change it. Third, the training was more extensive 
than any other similar investigation so far reported. Fourth, the 





i 1'The complete study is on file in the office of Dr. D. A. Worcester, Teachers 
College, University of Nebraska. 
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progress of the experimental group was scientifically and statistically 
determined with a large control group as a check. Fifth, the experi- 
ment was carried out in actual college classes with the subject matter 
of the classes as the media for instruction. And sixth, an investiga- 
tion was made on the probable permanent effect of the training upon 
the subjects. 

The study reported in this article was the result of a realization of 
the student difficulties in beginning college work. The increased 
enrollment and the number of college problems caused those in charge 
of entering freshmen at The Nebraska Wesleyan University to 
make a careful study of these conditions. During the school year 
1928-1929 many faculty meetings were held with the result that the 
writer was directed to begin some remedial program for reading and 
study difficulties. Under the advice and direction of Dr. D. A. 
Worcester of the University of Nebraska, it was decided to offer a 
laboratory period for certain freshmen and sophomore students in 
classes in education courses. During the first semester of 1929-1930 
three such classes were used. The attendance was voluntary on the 
part of the students. However, emphasis was placed upon the 
importance of reading and the need for remedial work. The fresh- 
men tests revealed to the students their weaknesses in this field. The 
result was that practically the entire membership of the classes 
attended. The laboratory group met once a week in three sections 
so that each section would be small enough to handle and also to 
avoid conflicts with other college work. A total of ninety-one students 
participated during this semester. 

Three classes were used as controls. These classes were similar 
to the experimental ones in that they were taking the same work and 
were of the same rank in college. They were merely of different 
divisions. The members of the control groups were given the same 
initial and end tests and otherwise treated as the experimental students 
except that they were not given the remedial laboratory work. The 
control groups were used for the one semester only. 

Throughout the experiment all remedial teaching work and the 
grading of the test papers was done by the regular instructors of the 
classes, Professor Rose B. Clark and the author. 

During the second semester of the same year a second group of 
three classes were used with ninety-two students participating. A 
third group of three classes were organized the next year, 1930-1931, 
with eighty-seven students. 
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GENERAL PLAN OF THE EXPERIMENT 


The general plan of the experiment was to give a standardized 
reading test to both experimental and control groups at the beginning 
and at the end of the training. Part III of the Thorndike Intelli- 
gence Examination was used, different forms being used for the initial 
and end tests. The results of these tsts were used as checks. 

At each meeting of the laboratory group a drill test on comprehen- 
sion was given. The reading rate was also noted. For this purpose 
it was deemed best to use the materials that the students were study- 
ing in the classes. Therefore, the basic text of each class was selected 
as the material in the respective group for the remedial work. The 
instructors selected a passage each period from this text in advance of 
the class recitation. The materials used for the test would then be 
that which the student was supposed to master, but new to him. 
Each student had his own text and at the signal began reading at the 
place indicated. He read for three minutes, marking the work where 
he stopped when time was called. Time was checked with a stop 
watch. He then wrote down as many of the thoughts or ideas as he 
could recall. The students were instructed to make no attempt to 
memorize verbally, but to get the ideas and to express them in their 
own words. They were allowed all the time needed. When they 
had completed the writing they opened their texts and counted the 
number of words read. This number was placed on the paper with 
the date, name of the student, class, and page of the selection. 

The papers were scored by the author. The number of words read 
in the three minutes as reported by the pupil was divided by three to 
give the reading rate. This is the unit of measurement for the rate 
used in the study. Since different passages were read at each meeting 
it is not assumed that the tests were of equal difficulty from time to 
time. This would cause some fluctuation in the results of both rate 
and comprehension. However, since all the reading was done in the 
basic text of the class, none was of such ease or difficulty that the 
results would be changed to any large extent. In scoring the compre- 
hension the following method was used: The two instructors in charge 
of the labratory read the selection together. Each idea or separate 
thought was agreed upon and underlined in the instructor’s copy of 
the text. This idea was numbered and written down in a note book. 
Since the students would read different amounts, it was necessary to 
determine where each idea ended and the number of the word where it 





stu 
cer 
we 
acc 
rec 


OW: 
in { 
to | 


diff 
ing 
mee 
no | 
of 

inte 
was 


Seve 
Sele 
dent 
worl 
phre 
anal 








































Study Habits in College Students 261 


ended. In scoring, the first thing done was to determine how many 
words the student had read, and then to count the number of ideas 
from the text which was contained in this amount of reading. This 
number was placed on the student’s paper with red pencil. Then 
the student’s paper was read to determine the number of these ideas 
which he had comprehended and reproduced. The number of ideas 
given correctly was placed over the number which the master text 
showed was contained in the amount which the student had read. 
All foreign ideas or personal opinions of the student were rejected. 
The percentage of comprehension was thus determined for each 
test. The number of ideas contained in a three minute reading was 
usually small, from eight to twenty. Since the texts used were ordinary 
college books, it was not a difficult task to pick out the ideas in 
the selection. 

A large chart was constructed which contained the names of the 
students, their scores on the standardized test, and the rate and per 
cent of comprehension of each drill test as they were made week by 
week. This was placed in the laboratory so that the students had 
access to it at all times. Each student also kept a graph of his own 
record on charts prepared for that purpose. Private conference 
periods were held where the student had opportunity to study his 
own test paper. He was shown where his errors were and instructed 
in finding the ideas. All students in the lower quarter were requested 
to report for these conferences. | 

When the drill test was completed the remainder of the laboratory 
period was spent in discovering and attempting to solve study reading 
difficulties. A list of references on “how to study” and ‘“‘silent read- 
ing” was given and assignments made for study between laboratory 
meetings. Attention was given to eye movements. Unfortunately 
no apparatus was available for a more thorough individual diagnosis 
of eye difficulties. However, drills were conducted which were 
intended to develop correct movements. Some study of vocalization 
was made. 

At the beginning of the experiment it was found necessary to spend 
several periods in teaching the students how to find the ideas of the 
selection, and how to distinguish the author’s thought from the stu- 
dent’s interpretation. Several techniques were developed for this 
work. The students were asked to read and to mark any word or 
phrase or sentence that they did not understand. These were then 
analyzed in the laboratory. Each student kept a glossary of new 
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words. It seemed necessary that a careful vocabulary study be 
made. 

After the laboratory drill, the students were asked to practice like 
exercises at home. From the reports and comments it is believed 
that most of the students were very faithful to this task. 

Another important part of the experimental work was done with 
note taking. Special references were given for outside reading and the 
actual technique developed in the laboratory. The plan used forthe 
first meeting in the laboratory was to have each student take his 
text, begin reading at a designated point, then when he had found the 
first complete thought to raise his hand. When all, or most, had 
found the idea one was asked to give it, and this was then discussed. 
Another technique used was to have the student write down the ideas 
as he found them in the reading and then to compare the different 
lists. Sometimes the work was assigned as home study and the lists 
brought to the laboratory. | 

All through the experiment much individual help was given. 
Those who were having the most difficulty as shown by the tests were 
asked to report for private conferences. Much time was spent each 
week in this manner. Many students required regular case study 

methods. ; 

Throughout the experiment the problem of keeping up the interest 
of the students was kept constantly in mind. Every effort was made 
to make the laboratory as usable as possible in the preparation of the 
class work. Every assignment and every drill made use of the 
materials of the unit on which the class was studying at the time. 
This, together with the charts showing the individual progress and 
the realization that they were correcting bad study habits, was suffi- 
cient to keep up the attendance. 

The problem of attention has been recognized by most workers as 
very important in the development of study and reading habits. In 
the laboratory one period was spent in the discussion of the psychology 
of attention. Special drills were devised following the plan of Dr. H. 
C. Morrison. 

During the first semester’s training the groups met only eleven 
weeks, as the organization took time and vacations interfered. The 
second semester’s laboratory began at the opening of the term. Three 





1 Morrison, H. C.: “The Practice of Teaching in the Secondary School.” 
Chicago, University of Chicago Press, 1924, pp. 135-150. 
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new classes were used, as all the courses from which students were 
taken were only one semester long. These new classes were taught 
by the same instructors. Since a number of students continued the 
second term, one form of the Thorndike Test was given to all who had 
been in training the first term and to the new students. Much the 
same plan was used as had been during the first term. In the fall of 
1930 a third group of three classes was selected for another period of 
training. 

This report then deals with nine classes divided into three groups 
of three each. While all nine classes used were composed primarily 
of freshmen or sophomore subjects, there were a few upper classmen in 
them. In the nine classes used in the experiment, one hundred 
twenty-six members were freshmen, one hundred eight were sopho- 
mores, twenty-four were juniors, and seventeen were seniors. Of 
these sixty-four continued for two semesters and one hundred forty- 
one for one semester only. Six of the sixty-four remained for the three 
semesters. 

As an aid to the study of problem cases, certain educational and 
personality tests were used. The Kent-Rosenoff Association Test 
and the Thurstone Personality Schedule were given to all, and the 
Pressey X-0 test to the problem cases. However the results of these 
tests do not appear in this study. 

The control groups were given the same form of the Thorndike 
Test that was given to the experimental groups. They were also 
given the initial and end comprehension drill tests. 


RESULTS AND INTERPRETATIONS OF THE DRILL TESTS 


The initial test of the drill type showed scores ranging from zero 
to seventy-one per cent of comprehension. The median score on the 
first test was thirty-three; this median was raised to sixty-seven at the 
end of a semester’s training, the lowest score being thirty-three and 
the highest one hundred. The gain was fairly regular, although a 
slight drop is shown at the seventh week, where the tests were given 
the day on which the Thanksgiving recess occurred. 

The corresponding scores for words read per minute show arange on 
the first test from seventy to three hundred fifty, and on the last test 
of the semester from sixty-five to four hundred ninety. The median 
number read on the initial test was one hundred eighty, on the end test 
one hundred ninety-five. No effort was made in the study to increase 
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Chart II 


Comprehension drill scores for the control group 


Tae highest and lowest single percents of comprehension and the 
median and quartile percents for the control group at the beginning 
and at the end of the experiment 
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Chart IIr 


Rate ecores for the axperimental group 


The greatest and least number of words read per minute and the median 
end quartile scores for successive weeka for the experimental group. 
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Chart Iv 
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Rate scores for the control group 


The greatest and least number of words Teed _per minute and the 
median and quartile scores for the control group at the beginning 
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speed. These results would indicate that the growth in comprehension 
was not at the expense of speed. 

Chart I shows the lowest and highest single scores on each weekly 
test and the median quartile scores for the group for these tests. Note 
that the lower quartile of the last test taken after a semester of training, 
which is fifty-seven, is thirteen per cent higher than the upper quartile 
of the first test of the semester. 

Chart II shows the highest and lowest single scores and the median 
and the quartile scores for the control group. This group consisted of 
sixty-eight cases. The median scores for both the unitial and the end 
tests was thirty-three, or the same as for the initial test of the experi- 
mental group. The quartiles show some slight variations. This 
would seem to point to decided gains in comprehension for the training 
groups, as the other factors were held as constant as possible. 

Chart III shows the highest and lowest single scores and the median 
and quartile scores in rate as recorded in number of words read per 
minute for the experimental group. Chart IV shows similar curves 
for the control group. The median remains almost the same for both 
groups while there is a slight variation in the quartiles. The median 
for the control group remains the same for the end test as for the initial 
test, and is the same as the median of the initial test for the experi- 
mental group. 

The lowest and highest single scores and the median and quartile 
scores of comprehension for the sixty-four who remained in training 
for the two semesters is shown in Chart V. The curves are not quite 
as regular as those in the first chart, but the number of cases is much 
smaller. The median ranges from thirty-one for the first weekly test 
to eighty on the twenty-sixth week, or an actual gain of one hundred 
fifty-eight per cent. In the last six tests a total of twenty-nine cases 
showed a one hundred per cent score. 

Chart VI shows similar data for the number of words read per 
minute for those students who were in training two semesters. Con- 
trary to the reports of some investigators, these results do not show an 
influence of comprehension upon rate. While the median percentage 
of comprehension is increasing one hundred fifty-eight per cent as 
shown in Chart V, the rate remains nearly constant. 

In order to show the results in a different way an average of the 
first two and of the last two tests of comprehension for each individual 
student wasfound. The difference between these two averages would 
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show the amount of gain in percentage for each student. The results 
show that of the two hundred two different students, three made no 
gain, while five showed a loss, that is five produced a larger percentage 
on the first two tests than they did on the last two. All other students 
showed a gain, ranging from three to sixty-seven per cent. The 
student making the greatest gain averaged nineteen per cent on the 
first two tests and averaged eighty-six per cent on the last two tests. 
So he actually increased his percentage of comprehension 3.53 times 
during the semester. The median amount of gain for all students was 
twenty-seven. The lower quartile of gain was eighteen and the upper 
quartile thirty-six. 


RESULTS AND INTERPRETATIONS OF THE STANDARDIZED TESTS 


The results so far reported deal with the comprehension drill tests 
which were local and not standardized. This next section reports the 
results of the standardized test which consisted of four different forms 
of Part III of the Thorndike Intelligence Examination. All students 
whose records are reported in the results above took one form of this 
test at the beginning, another form at the end of a semester’s training. 
A third form was given to those who remained at the end of the second 
semester. All students in the experimental group were given the tests 
during the semester they were in the laboratory. 

Chart VII shows the results for the experimental and control 
groups. The first test scores for the experimental group show a range 
from thirteen to one hundred fourteen, the highest possible score 
being one hundred twenty-eight. The median score was fifty-six. 
Over a fourth of the group comprehended less than a third of the 
possible amount, and over half the group comprehended less than 
fifty per cent. At the close of the semester of training the median 
score had risen from fifty-six to sixty-four. The high scores remained 
the same, although they were not made by the same student. For the 
control group the first test shows a range from eighteen to one hundred 
thirteen which is almost the same as for the first test of the experi- 
mental group. But the end test shows no gain in the median, in fact, 
it shows a slight loss. 

To present more conclusive evidence of the significance of the gain 
shown by the experimental group on the Thorndike test, the critical 
ratio of the difference of the means was computed. It was found that 
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Chart V 


Comprehension drill scores for the experimental croup 
Two semesters of training 
The highest and lowest single percents of comprehension and the 
median ani quartile percents for successive weeks for those who 
remained in training for two semesters. 
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Chart VI 


Rate scores for experimental group 
Two semesters of training 
The greatest and least mumber of words read per mimte and the modian 
and quartile scores for successive weeks for those who remained in 
training for two semesters. 
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Chart VII 


Comparison of experimental and control croups on Thorndike Test 
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the actual difference of the means, 9.65, was 6.459 times as great as 
the probable error of the difference of the means, which was 1.494. 
This may be regarded as practical certainty and is highly significant. 

As was stated above, the classes used in this experiment were com- 
posed of freshmen and sophomores for the most part. Students from 
advanced classes expressed a desire to take a form of reading test; so it 
was administered to three other education classes and to one class in 
advanced biology. This provided an opportunity to compare reading 
scores among students in the four college classes. These results are 
shown in Table I. This would seem to indicate that students under 
ordinary conditions do not develop reading comprehension in college. 
One should recall that the freshmen were given these tests very 
shortly after they came to college and before they were oriented. In 
all cases there was a great range in reading ability, yet these students 
were in the same clases, taking the same assignments, and receiving 
the same degrees and certificates upon completion. 


ARE THE RESULTS PERMANENT? 


Few studies in reading comprehension have secured any data on 
the permanence of retention. In our study an effort was made to 
collect some data bearing on this problem. In January, 1931, approxi- 
mately one year after the completing of the first semester of training, 
an attempt was made to secure as many students as possible who had 
participated in the experimental group and to test them again. Forty- 
two were secured, and another drill test similar to the ones used in the 
training period was given. The results are shown in Chart VIII. 
This chart shows the highest and lowest individual scores and the 
median and quartiles for the tests taken at the beginning of training, at 
the end of one semester’s training, and again after a lapse of a year. 
While the scores drop slightly from the end of training, it will be noted 
that the learning has persisted almost entirely for the year. It is 
not to be assumed that these students still in school had not practiced 
since the close of the training period. Most of them said they were 
using the methods of study learned in the laboratory. However, this 
would not destroy the values. If such training leaves permanent 
habits of study, the experiment is justified. 

At the close of the first semester of training and again at the close of 
the second semester, the students were asked to comment in writing 
concerning the work. They were asked to give their impressions and 
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not to sign their names. Very few unfavorable comments were given. 
They were also requested to comment at any time and especially in the 
personal interviews. Not all students were in sympathy, and a few 
dropped out. Some offered criticisms at times, but for the most part 
those criticisms were constructive. 

Another source of encouragement was from other instructors in the 
college. Everyone who watched the experiment was optimistic. 
Several other departments instigated study plans, based upon this 
experiment, and all report very favorable progress. 

From the Personnel Director’s office came the report that only 
three students who participated in the training were reported delin- 
quent in any college class. Since the percentage from other classes 
was many times this number from this experimental group, it is 
assumed that there is at least a probability of a relationship. 

From every source there is evidence of success in the experiment. 
It is being continued as a part of freshman training, and further reports 
may change the findings somewhat.’ 


SUMMARY AND CONCLUSIONS 


1. The experiment reported above was begun in the fall of 1929 at 
The Nebraska Wesleyan University in an attempt to discover whether 
college students were falling because of poor reading ability and poor 
study habits, and if so, to attempt to find some remedies for the 
situation. This work was done with nine freshman-sophomore classes 
in courses in education with three other similar classes used for con- 
trols. The experimental group met once a week for at least one hour. 
Drill tests in reading were given each time and remedial methods used. 
Standardized tests were given at the beginning and at the end of 
training. All students showed a very satisfactory gain during one 
semester’s training. The median per cent of gain in reading compre- 
hension was over one hundred. 

2. The records for one semester of training are shown for two 
hundred five students and the records for two semesters of training 
are shown for sixty-four students. 

3. The results of both the drill tests and the standardized tests 
show a steady increase in ability to comprehend for the experimental 





_ + Ina further study for the past year the data sustain the conclusions reached 
in this paper. 
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group. The control group showed practically the same results on the 
end test as on the initial tests. 

4. No formal attempt was made to increase rate, and the words 
read per minute in both the experimental and control groups remained 
practically the same. The training group showed a very slight gain, 
but not enough to be significant. 

5. The results of the standardized test showed a statistically 
significant gain. 

6. Tests given to forty-two students one year after the close of 
the training period showed a high degree of retention of the improve- 
ment made during the experiment. 

7. The remedial work of the experiment was largely individual 
although considerable group instruction was used. 

8. College seniors in this study had no better reading habits than 
had the freshmen. 

9. The study shows that students are able to increase their 
reading comprehension as much as one hundred per cent in short 
training periods over a few weeks. These students continued to 
improve their comprehension ability for two semesters. 

10. The possibility of increased efficiency holds true for the good 
student as well as for the poor one, although not in like amounts. 

11. Students for the most part are vitally interested in such 
remedial work, and will assist in every way. Keeping the student 
informed of his progress seems to be one way of keeping interest. 

12. The fact that the standardized test used in this experiment 
probably measures some different things from those shown in the local 
comprehension drill test indicated that there is a possibility of a trans- 
fer of training to other classes. 

13. Good remedial work must include individual diagnosis and 
assistance; group methods are inadequate for the problem cases. 

14. The work reported in this paper is to be continued. A more 


exhaustive study should be made of the results. A study should be 


made to try to determine whether the results gained in those experi- 
ments carry over into other college classes. Most study should be 
made of the permanence of the training habits. 


Note.—No bibliography is included in this paper. Should the 
reader desire more material he is referred to the summaries of reading 
investigations by William 8. Gray which have appeared from time to 
time since 1926 in the Elementary School Journal. 
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THE TEACHER’S INFLUENCE UPON THE SOCIAL 
ATTITUDE OF BOYS IN THE TWELFTH GRADE 


ABRAHAM KROLL 


Fordham University 


In 1922, Manly H. Harper developed a scale by means of which 
he measured the trait conservatism-liberalism-radicalism in American 
educators.!. The scale consists of seventy-one propositions bearing 
upon important social problems which are responded to on an agree- 
ment-disagreement basis. A scoring key converts the responses into 
numerical scores, the minimum being zero and the maximum seventy- 

ne.. A low score indicates that the subject’s social attitude tends 

toward conservatism and a high score indicates his trend toward 
radicalism. On the scoring key the author gives norms for teachers 
having an educational background ranging from four years above 
the eighth grade to graduate educators in general. These norms 
vary from forty-one to fifty. Coefficients of reliability ranging from 
.715 to .904 based on four samplings of his data are furnished in 
Harper’s monograph.? 

For many years opinions have been expressed by conservative 
teachers that radical teachers are teaching radicalism to high school 
boys and by the radical teachers that conservative teachers are 
indoctrinating conservatism. Supervisors have entered the class- 
rooms of teachers to verify these opinions at first hand. In the short 
time per visit and the relatively long intervals between visits, it was 
not possible to determine accurately or objectively whether these 
claims and counter-claims had any basis in fact. It was to evaluate 
these statements that the following preliminary investigation was 
undertaken. 

This investigation used boys in the senior year of the high school 
because (a) of the groups available this group was most comparable to 
the group on which the Harper scale had been standardized, (b) this 
group had completed one year of study of the trends of modern civili- 
zation in a course in European history—thus they had an opportunity 





1 Harper, Manly H., Ph. D.: A Social Study, published and copyrighted, 1927, 
Bureau of Publications, Teachers College, Columbia University, New York City. 
2 Social Beliefs and Altitudes of American Educators, Teachers College Contri- 
butions to Education #294, Bureau of Publications, Teachers College, Columbis 
University, New York City. 
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to form an opinion on social problems, and (c) this group was studying 
American history and economics. 

The investigation included the classes of six different teachers. 
Three of the teachers were frankly conservative in their social attitudes 
and three were radical. The classes of these teachers were given the 
first application of the Harper scale in February (near the beginning 
of the semester), and the second application in June (toward the end 
of the semester). The purpose was to measure the average change in 
attitude of the classes of each teacher and of each group, the direction 
of the change and the statistical significance of the change, if any. 

The writer has been connected with a large boys high school 
for more than seventeen years and had formed opinions as to who 
were the most conservative and radical teachers of history and of 
English. In cooperation with the principal of the school six teachers 
were selected for the investigation. To corroborate this selection,, 
the chairman of the departments of history and of English were 
consulted. Their agreement with the choice of teachers for the 
experiment was complete. To further verify this selection, the 
opinions of four fellow teachers in the history and English departments 
were solicited. These four teachers had taught with the six selected 
not less than ten years. Their agreement with the selection was 
unanimous. The selection was as follows: 











Conservative teachers of Radical teachers of 
History English History English 
2 1 2 1 














A seventh teacher (a teacher of history) was selected in a similar 
manner. This teacher was chosen because he was not an extremist. 
The purpose was to use his class for a study of the reliability of the 
scale. The reliability of the scale was determined originally on 
teachers. There was no basis for the assumption it would be equally 
reliable for twelfth grade boys, therefore it seemed best to obtain an 
estimate of its reliability when so applied. 

The classes that recited to these teachers during the seventh 
(last school recitation) period of the day were selected because (a) 
they were all convenient in size; (b) there could be no duplication 
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of boys and (c) the selection of classes was random with regard to 
population. 
In order that the test be given in exactly the same way in each 
a of the classes the writer, personally, replaced the teacher in each 
1 instance and conducted the test. Before the first application of the 
Te Harper scale in February 1933, the following statement was read 
eh to the boys: “A great deal of opinion has been expressed concerning 
the social attitudes of seniors in the high schools. Some of this has 
been favorable and some has been unfavorable. The principal has 
asked me to obtain a sample of senior opinion on important social 
problems and this class has been selected to cooperate in this study. 
It is understood that your statements on these papers will be considered 
confidential and that the results will in no way affect your class marks 
as your teacher will at no time be informed as to what any individual’s 
score or attitude is. It is for this especial reason that the principal 
has asked me to conduct this study personally and to take the papers 
away with me. Your earnest cooperation is solicited.” 

Before the second application of the Harper scale in June 1933, 
the following statement was read to the boys: “Earlier in the term 
we obtained from each of the boys of this class a score which was 
indicative of their attitude upon important social problems. We are 
in the field of education and there should be growth in ability to think 

‘clearly and consistently on social problems if our teaching has been 
effective. We are asking you to indicate your opinions of the problems 
in this ‘study’ again so that we may have a means of judging what 
change, if any, has taken place under our instruction. Please be 
frank and sincere. I can assure again, that the information given will 
be treated as confidential and will never be conveyed to your teacher 
so that it cannot affect your marks.” 

In the seventh class where the Harper scale was applied in succes- 
sive weeks (March 8 and 17) and same first statement referred to 
above was read prior to the first application. Prior to the second 
application, however, a frank statement was made that we were 
seeking to check the reliability of this study and we asked the boys 
to score it as honestly as they knew how. It is believed that the 
boys could not remember the way they marked the first application 
of the Harper scale because the propositions are too numerous and 
often worded in a way which requires some thought for interpretation. 
In none of the seven classes was there any implication, any statement 
or any information given that a second application of the test was 
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forthcoming at some time in the future. Caution was exercised to 
prevent such information from reaching both the teachers and the 
pupils. 

While it is true that the total number of the population of this 
reliability study is far too small to warrant a final conclusion, it is 
to be noted (even at the expense of repetition) that this entire investi- 
gation was intended only as a preliminary one and that the indications 


TaBLE I.—RE.LIABILITY DETERMINED BY RETEST IN A Boys’ SEconDARY SCHOOL 
(ELEVENTH GRADE) In Manca, 1933 
First Application 3/8/33, Second Application 3/17/33 
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cation | cation | first | first | second | second 

appli- | appli- | appli- | appli- 
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Correlation coefficients...... 1 2 3 4 5 6 
First application..........} 1] ..... .76 
Second application........ .76 
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as to reliability, etc., were all that were sought. 


Accordingly, it 


appears from Table I that the reliability of the Harper scale when 
applied to boys of the twelfth grade is in the vicinity of .76. The 
average of the reliabilities of the alternate halves is .74. This may be 
accepted as satisfactory for our purposes, particularly in view of the 
reliability coefficient of .715 determined by Harper on about one 
hundred thirty village teachers in a Mid-Western state. The ratio 
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of the difference of the means to its probable error is so materially in 
less than four that this difference in the means may be said to be due gr 
to chance, and, therefore, not significant.* tes 
In Table II, which gives the data for the applications of the in 
Harper scale to the classes of the conservative teachers, it is of interest 
to note that under guidance of these teachers the average score of we 
each class as well as the average of the three classes combined increased set 
somewhat. pr 
This may be taken as an indication of an average trend toward nif 
liberalism. In one instance (Class #1) this increase approaches our dif 
criterion of statistical significance so closely that it was thought that in 
the selection of this teacher as a conservative had been erroneous and of 
that possibly he was radical in his personal views. This possibility 
led us to request that the teachers cooperate by permitting us to in 
determine their position on the scale. The teachers’ scores are rac 
) by 
Tasie IT.—ApriicatTions oF Harper’s SCALE TO THE CLASSES OF THE TEACHERS the 
Wao Were ConsipERED TO Be CONSERVATIVE IN THEIR VIEWS ho’ 
Composite err 
Class #1 Class #2 Class #3 ame ri 
First | Second! First | Second} First | Second} First | Second Ta 
appli-| appli- | appli-| appli- | appli-| appli- | appli-| appli- —_ 
cation | cation | cation} cation | cation} cation | cation} cation 
PN Bai 6:8 vind ica 2/33 | 6/33 | 2/33 | 6/33 | 2/33 | 6/33 | 2/33 | 6/33 
it Population....... 35 27 29 91 
Vm Mean........... 43.77 | 45.94 |43.59 | 43.81 |39.72 | 41.31 |42.58 | 43.80 
bi | PE mean........ .80 .98 | 1.25 1.28 | 1.01 1.16 .59 . 67 
ee i) en seeebees bse’: 7.00; 8.56 | 9.64; 9.83 | 8.09 | 9.23 | 8.41 9.43 a, 
Le ee at: <)- ht oy 82} .36| .47 Dat 
| | Correlation. ..... .82 .70 .73 75 Por 
a M; minus M,.... 2.17 .22 1.59 1.22 Me 
iia PE M; minus Mi. 56 98 81 45 PE 
i i Ratio of diff. and 
ie | PE of diff...... 3.87 22 1.96 2.71 PE 
| a | Teacher’s score.. . 50 54 52 Cor 
eT iit M; 
a PE 
i * Holzinger, Karl J.: Statistical Methods for Students in Education. Ginn & Rat 
Company, 1928, p. 237. ‘‘ . .. a difference of any sort is not significant unless P 
it is at least four times its probable error.”” The writer has used this as his criterion Tea 


of statistical significance. 
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included in the table. Their scores place them on the level with 
graduate educators according to Harper’s norms. Noting this 
teacher’s score on the scale, we believe we were probably not in error 
in the selection of this teacher. 

Taking the composite figures for the conservative group however, 
we find that the difference in the means does not satisfy the criterion 
set up for statistical significance. The ratio of this difference to its 
probable error of 2.71 is a small difference that may have some sig- 
nificance but not perfect dependability. This comparatively small 
difference will be more evident after an examination of the data given 
in Table III, which is concerned with the details of the applications 
of the Harper scale to the classes of the radical teachers. 

In Table III it is of interest to note that the increase in the mean 
in each instance was materially greater for the classes taught by 
radical teachers than was the increase in the mean of any class taught 
by a conservative teacher. The radical teachers’ scores approached 
the maximum limit of the Harper scale. Of outstanding interest, 
however, are the ratios of the differences in the means to their probable 
errors. In each class, without exception, this ratio is greater than 
four, despite the fact that the numbers in each instance are small. 
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TaBLe II].—AppiicaTions oF HarPErR’s SCALE TO THE CLASSES OF THE TEACHERS 
Wuo Were ConsIpDERED To Bz LIBERAL IN THEIR VIEWS 





























Composite 
Class #4 Class #5 Class #6 total 
First | Second} First | Second! First | Second] First | Second 
appli- | appli- | appli-| appli- | appli-| appli- | appli-| appli- 
cation | cation | cation} cation | cation! cation |cation| cation 
Be 2/33 | 6/33 | 2/338 | 6/33 | 2/33 | 6/33 | 2/33 | 6/33 
Population....... 30 31 31 92 
ee 45.93 | 53.37 |44.71 | 48.68 |44.65 | 51.03 |45.11 | 51.17 
PE mean........ 1.43 1.23 | 1.12 1.35 | 1.27 1.27 71 .74 
Ps avin atissied 6 akan 11.61 | 9.95 | 9.28 | 11.18 |10.52 | 10.51 |10.15 | 10.58 
Peano. +6 1.01 .87 .79 .95 .90 .90 .50 .52 
Correlation. ..... .90 .78 81 .83 
M; minus M;.... 7.44 3.97 6.38 6.06 
PE M; minus Mj. .63 85 .78 47 
Ratio of diff. and 
PE of diff...... 11.81 4.67 8.18 12.89 
Teacher’s score... 69 65 68 

























wer’ 
—s 


Pe 
S85 


te 8° Oe 
pBmte Se zeae 


wet 


- - 
SS 


— x i rea vod. 
Me So a ee 
- ~ “ eae eS - 


i , 
eee 


———E —— 
~ tar I — 
225 as - ies. ‘ * Pee | 
Ys . 3 a a ee 
_ ee we “ M ~e 
- ae . re 


~_ 2 a : 
= 
sere 2 aS a =o <n 


Fee i. SRR pee Bi aw lige PSS we 
wp, 4 
a 








— “et 





= 


Oe A 
*. 


, aa we PPR eg ee 
See EE ae ee 
a... mem e 


s.r 


——- 


> gabe 




















280 The Journal of Educational Psychology 


When the composite total is taken into consideration the probable 
error decreased materially because of the increase in number, and the 
ratio of 12.89 was obtained for the difference of the means to its 
probable error. The possibility that this difference in the means 
(6.06) of the classes taught by the radical teachers may have been 
due to chance is about one in more than one hundred billion. 

The difference between the mean gain of the group taught by 
the radical teachers (which was 6.06) and the mean gain of the group 
taught by the conservative teachers (which was 1.22) amounted to 
4.84. This difference has a probable error of .64. The ratio of this 
difference to its probable error is 7.56. This signifies that the possi- 
bility that the difference in the mean gains between the two groups 
was due to chance alone was about one in ten million. 

This study indicates the following in this school: 

(a) Harper’s ‘‘A Social Study” can be used as a fairly reliable 
measure of the change in the social attitudes of high school boys of the 
twelfth grade. 

(b) There is little foundation for the statement that conservative 
teachers indoctrinate conservatism. 

(c) There seems to be some basis for the opinion that radical 
teachers are probably teaching the pupils to question the status quo. 

The writer plans to carry on further investigations in this field. 
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SUGGESTION EFFECTS OF THE TRUE-FALSE TEST 
CHESTER E. SPROULE 


University of Southern California 


The true-false test is often criticised because it presents many 
false statements which, it is believed, will implant erroneous informa- 
tion in the minds of the pupils taking the test. Most of these asser- 
tions, however, are based on belief rather than on experimental proof, 
for the simple reason that there is little experimental evidence availa- 
ble. The few experiments that have been made tend to show that 
these negative suggestion effects are much less important than some 
would believe. 

As these previous investigations were confined to the high-school 
and college level, it was the object of this study to determine the effects 
of these true-false test suggestion influences on younger children. 
Negative suggestion effects, positive suggestion effects, permanency 
of these effects and means of nullifying them were studied in these 
experiments. 


THE FIRST EXPERIMENT 


Seventh grade pupils in social studies classes and ninth grade 
students in general science were selected for this experiment. Each 
grade was divided, on the basis of equal reading ability, into one 
control and two experimental groups. All groups were given a fifty 
item, five response multiple-choice test based on the subject-matter 
of their respective classes. On the following day one of the experi- 
mental groups in each grade was given a true-false test on the identical 
items covered in the multiple-choice test. The other experimental 
group, in each grade, took this true-false test two weeks later. Then 
the multiple-choice test was again presented to all groups. These two 
multiple-choice tests when compared should reflect the influences 
of the intervening true-false test. Chance and practise effects were 
measured by the control groups who took both multiple-choice tests, 
but not the true-false test. 

As a check on the permanency of the changes made on the second 
multiple-choice test that were caused by the true-false test questions, 
the true-false test was again presented to all groups as a final test. 
This showed the number of suggested changes that were verified when 
the question was presented in another form. 
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The average scores on the two multiple-choice tests were first 
compared asin Table I. Then the two multiple-choice tests for each 
individual were compared, item for item, to determine the influence 
of the intervening true-false test. Both negative and positive sugges- 
tion effects were traced in this manner. The control group tests 
were compared in the same fashion and used as a correction on the 
results found in the experimental groups. 

Results in the Ninth Grade.—When tested the day following the 
true-false test, one response in one hundred fifty was changed to 
the false response as suggested by the true-false test. One response in 
sixty was changed to the suggested true. 

Fourteen days after the true-false test had been given, one response 
in five hundred was changed to the suggested false and one response 
in eighty to the suggested true. 

The net result of these suggestion influences was positive, although 
not great. 

Most of the suggested responses were held-to be true, once they 
had been accepted, when the question was presented in another 
form. 

Results in the Seventh Grade.—One response in forty was changed 
to the suggested false and one in thirty-three to the suggested true 
when the multiple-choice test followed the true-false test by one 
day. 

One response in seventy was changed to the suggested false and 
one in forty to the suggested true when tested fourteen days after 
the true-false test. 

The net result of the suggestion effects was slightly positive. 

A pronounced tendency was found to retain the suggested responses 
once they had been accepted. 


THE SECOND EXPERIMENT 


The subjects for this experiment were fifth grade children in 
social studies classes, and like the previous experiment, were divided 
into one control and two experimental groups. 

The procedure followed was identical to that described for the 
first experiment, except for the fact that the children were allowed 
to correct their own true-false tests. The teacher read each question, 
announced its truth or falsity, and supplied the correct response to 
each false statement. In this manner any false impressions that may 
have been made by the true-false test were arrested and corrected. 





and 


was 








ll Dell 


in 


he 
ed 


mn, 


ay 





Suggestion Effects of the True-false Test 


283 


TaBLE I.—DIFFERENCE BETWEEN THE AVERAGE SCORES ON THE INITIAL 
MULTIPLE-CHOICE TEST AND THE SAME Test REPEATED WITH A 
TRUE-FALSE Test INTERVENING IN THE EXPERIMENTAL GROUPS 
















































































Grade | No. |: r™ ‘cies SD of | __Dift on ach 
vee ‘| in days diff. | SD of diff.| “* 
averages difference 
I. True-false tests corrected in class 
5 14 1 6.1 2.9 2.103 98 in 100 
5 14 14 5.0 3.3 1.515 94 in 100 
II. True-false tests not corrected by pupils 
7 14 1 0.1 1.8 0.055 52 in 100 
7 14 14 0.4 2.1 0.190 57 in 100 
9 13 1 0.9 2.1 0.429 67 in 100 
s) 13 14 1.4 2.2 0.636 74 in 100 
III. Groups not having the true-false test 
5 28 0.9 2.0 0.450 67 in 100 
7 24 —1.5! 1.9 0.789 78 in 100 
9 19 1.0 2.1 0.476 68 in 100 





TaB.Le I].—Net EFrrect oF THE TRUE-FALSE TEST 


1 The minus sign indicates that the average score on the second multiple-choice 
test was lower than on the first. 












































Average’ | Average’ 
Interval ; rie Average! | Per cent 
Grade Me. in days | ae net effect | net effect 
effect effect 
I. True-false tests corrected in class 
5 14 1 0.21 2. +2.58 +5.2 
5 14 14 —0.64? 2.07 +2.71 +5.4 
II. True-false tests not corrected by pupils 

7 14 1 1.26 1.4 +0.23 +0.5 
7 14 14 0.68 1.20 +0.52 +1.4 
9 13 1 0.42 0.85 +0.43 +0.9 
9 13 14 0.11 0.62 +0.51 +1.0 








* Averages are based on the number of responses influenced in a fifty item test 
and are corrected for chance. 
* Minus sign indicates that the average negative effect of the true-false test 
was less than chance effects in the control group. 
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The average scores on the multiple-choice tests were compared 
as in the first experiment and the results tabulated in Table I. The 
two multiple-choice tests were also compared, as previously described, 
to determine the suggestion effects of the true-false test when it is 
corrected in class. These data are partly reproduced in Table II. 

Results of the Second Experiment.—No evidence of negative sugges- 
tion effects of the true-false test was found beyond that accounted 
for by chance. 

A pronounced positive suggestion effect of from four to five and 
one-half per cent was traced to the true-false test. 

A further positive effect of from one to two and three-tenths per 
cent was noted that could not be directly attributed to the true-false 
test itself. 

There were strong indications that these suggested responses were 
being retained. 

No noticeable difference was evident between tests corrected imme- 
diately and those corrected after a period of two weeks. 


GENERAL CONCLUSIONS 


The results of these experiments are quite substantially in accord 
with those of previous investigations in that they show the negative 
suggestion effects of the true-false test to be rather slight and usually 
outnumbered by the positive suggestion effects. The present evidence 
does not justify a condemnation of the true-false test on the basis 
of false impressions that it produces. 

The practise of allowing children to correct their true-false test 
papers seems to offset practically all the negative effects and to 
contribute considerable positive knowledge. As this procedure does 
not detract from the value of the true-false test as a measuring instru- 
ment it is to be especially commended. It is probably safe to conclude 
that the true-false test, when corrected by the pupils, can be used as 
low as the fifth grade without fear of harmful suggestion effects. 
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THE RELATION BETWEEN HANDEDNESS AND SOME 
PHYSICAL AND MENTAL FACTORS! 


ARTHUR B. FITT AND K. H. O’HALLORAN 
Auckland University College, University of New Zealand 
The study here reported was undertaken with the hope of throwing 


further light on the hereditary significance (if any) of handedness 


tendencies by correlating handedness with several aspects of intel- 
lectual ability, with psychopathic tendency, with height and with 
speed of tapping. 

The following tests for handedness were used: Cutting with scissors, 
winding cord round a stick, throwing a ball, receiving an object, easy 
reaching for an object, strenuous reaching for an object, interclasping 
hands, batting (in the air, not on the ground), taking grip measure- 
ment with the dynamometer, quick tapping on paper. In addition, 
tests for eyedness were given but the results are not presented here. 
Of the handedness tests, interclasping and easy reaching were found 
to be the least satisfactory. 

The handedness score was based on the results of nine tests, each 
applied three times. For each group used for correlations or com- 
parisons the score was derived from the same set of tests. Haefner’s 
scale was used by giving two points for each test in which one hand 
(in this study the left hand) was used three times, one point if the one 
hand was used twice, half a point if it was used once, nought if not used 
at all. 

Psychopathic tendency was assessed with the aid of a question- 
naire which is a modification of the Woodworth-Mathews question- 
naire. The following sixteen questions were presented, each to be 
answered by ‘‘yes” or ‘‘no’’: 


1. Do you like to play with other children? 
2. Do other children let you play with them? 





1 This study is the work of an Educational Research Group of the Auckland 
University College, University of New Zealand. Most of the members of the 
group are Auckland teachers who have taken the M.A. degree in Education. Each 
member of the group has contributed to the work but the collection of data and 
calculation of results have been carried out chiefly by Messrs. F. R. J. Davies, 
J. H. Frith, H. W. Salmon and the writers of this article. The experimental data 
have been obtained in several Auckland educational institutions and we wish to 
thank the staffs of these for their very helpful cooperation. 
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Handedness and Physical and Menial Factors 


3. Did you ever run away from home? 
4. Do people find fault with you much? 
5. Can you sit still without fidgeting? 
6. Are you usually happy? 
7. Do you ever feel that nobody likes you? 
8. Do you ever get cross over very small things? 
9. Do you feel that nobody quite understands you? 
10. Do you usually feel well and strong? 
11. Do you feel well rested in the morning? 
12. Do you often have bad dreams? 
13. Do you often feel bored or fed-up? 
14. Do you find it hard making up your mind about things? 
15. Do you often get frightened? 
16. Do you get tired of people easily? 


The psychopathic score was arrived at by counting one point 
for each abnormal answer to the questions. 


I, HANDEDNESS AND INTELLECTUAL AND SCHOLASTIC ABILITY 


(a) In this and other sections the score for handedness is given in 
terms of left-handedness, so that the correlations are between the 
degrees of left-handedness and various other factors. 

Left-handedness was correlated with intelligence, in so far as this 
might be represented by an IQ obtained by a New Zealand modifica- 
tion of Haggerty. The group consisted of thirty-six pupils of average 
age twelve and one-half years, constituting one of the first-year forms 
of the Kowhai Junior High School, Auckland. A correlation of 
—.22 + .11 was obtained. Little can be inferred from this except 
possibly the very slightest suggestion of the association of left-handed- 
ness with rather low intelligence. 

(b) For this same group left-handedness was correlated with educa- 
tional ability, this being obtained from the total marks gained by each 


| pupil in the school-entrance examinations (internal) at the beginning 


of the school-year 1931. A correlation of —.32 + .1 was obtained. 
This measure, more reliable and significant than the former, indicates 
& similar kind of association. 

(c) As a check on these results a survey was made of the amount 
of left-handedness found in primary-school, secondary school and 
university populations. As in New Zealand the chief difference 
between these educational levels (apart from age) is educational 
ability, transition being determined chiefly on the basis of external 
examinations, it might appear that the elimination of the lower levels 
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of ability would possibly eliminate incidentally a certain amount of 
left-handedness. This would be the case if there were anything in 
the correlations suggested under (a) and (6). 

The following questionnaire was answered by scholars at the 
three levels indicated: 


1. With which hand do you write? 

2. With which hand do you throw a ball? 

3. With which hand do you hold the scissors when cutting anything? 
4. With which hand do you hold the knife when having your dinner? 


The score used ultimately was not an individual one but that of a 
whole class or school. Each answer sheet was scored by giving a point 
(of left-handedness) for each time the left hand was used and the 
left-handedness score for the group is the total left-handed response 
calculated as a percentage of all the responses for that group. 

The primary schools were two state schools—Mt. Eden School, 
in which four hundred forty boys and girls (the main part of the 
school population between eight and twelve years of age) were tested, 
and Newton Central School, in which four hundred one boys and girls 
(the main part of the school population between eight and fourteen 
years of age) were tested. The secondary school was the Auckland 
Boys’ Grammar School, a state school. Responses were given by 
eight hundred eleven boys, this being the bulk of a school population 
ranging chiefly from thirteen to eighteen years of age. The Uni- 
versity group consisted of seven hundred undergraduates (men and 
women) of the Auckland University College, about half of the popula- 
tion, selected merely because they belonged to several of the larger 
classes, this saving time in the getting of results. 

The following are the results of the survey. The classes of the 
primary school groups increase in age and seniority from left to right. 
The Grammar School forms advance from Forms III (lowest) to 
Forms VI (highest). There are several divisions in each of these 
forms, corresponding greatly to general scholastic ability. These 
divisions read from the lowest ability on the left to the highest on ‘the 
right, as given by the headmaster’s classification. It should be noted 
that other factors, such as the choice of an occasional pupil regarding 
modern and classical courses, complicate this division slightly and 
prevent it from being a perfectly simple arrangement according to 
general ability, but these are relatively minor factors. 
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Taste I.—Primary ScHooLs 


Handedness and Physical and Mental Factors 




















































































































Primer|Standard|Standard|Standard/Standard| Form | Aver- 
IV I II III IV II age 
Mt. Eden. 
No. of pupils..... 47 80 101 97 115 (440) 
Left-hand, percent}; 5.3 | 12.2 9.4 4.6 5.4 7.4 
Newton Central. 

No. of pupils..... 53 52 68 63 97 68 | (401) 
Left-hand, percent} 8.5 2.4 10.7 9.5 9.0 8.4, 8.2 
TasLe II.—AvcKLAND GRAMMAR ScHoo.t (SECONDARY) 

Aver- 
III A | III B |M.IITA) III C} III D|M.ITIB/M.IITC pee 
No. of pupils...... 39 32 36 34 32 33 24 (230) 
Left-hand, per cent} 6.4] 1.5 7.0| 5.9] 2.3 8.8 §.2| 5.3 
Aver- 
IVA | IVB |M.IVA| IVC | IVD | M.IVB| M.IVC ome 
’ 37 37 38 35 34 36 40 (257) 
No. of pupils...... 
Lelt-hand, per cont 2.0} 0.7 5.9} 4.3] 9.6 8.8 1.9 | 4.7 
VA| VB/8.M.| VC] VD| VE| M.VBI VR.| M.vC ae: 
No. of pupils......... 30 (25 | 33 /83 (33 (23 | 37 (20 | 36 (270) 
Left-hand, per cent...| 7.3} 6.0) 5.3) 5.3) 2.3) 3.2).11.4 | 7.5) 9.7] 6.4 
VIA | VIB | VIC | Average 
ERE at ae ara TE ota 16 17 21 (54) 
Ne ee Te a wleu cb 4.6] 7.1] 6.0 5.9 

















Note.—M. indicates the Modern Forms. 


means V Remove. 


Tasie III].—AveRAGES FOR THE THREE LEVELS 


S.M. means Senior Modern. VR. 

















Primary | Secondary | University 
ys TE AIS aR SER are ipa ier Se 841 811 700 
Left-hand, per cent...............0se00- 7.8 5.6 4.0 
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Thus in a rather remarkable way the percentage of left-handednesgs 
decreases stage by stage with the transition from the primary school 
to the university. This would seem to corroborate the above figures 
showing a significant correlation between handedness and scholastic 
ability. The elimination of those of poorer intellectual ability which 
undoubtedly takes place in the New Zealand schools in the transition 
to a higher grade of institution, apparently brings about a correspond- 
ing elimination in the amount of left-handedness amongst the scholars, 

The change with age alone does not seem to be at all significant, 
as will be shown by a glance through the figures for the successive 
age-levels (7.e., classes) in the two primary schools and in the secondary 
school. 

A similar relation between general ability and the amount of 
left-handedness appears when one correlates the percentages of left- 
handedness with the form positions in each of the groups III, IV,V 
and VI in the Grammar School. As indicated, these positions rank 
approximately from the highest on the left (the A forms) to the lowest 
on the right. Rank correlations give these results between general 
ability and left-handedness: 


TaBLe IV 





Form III Form IV Form V Form VI 





— .07 — .33 — .39 — .52 














It is not unnatural here to find an increase in the size of the coef- 
ficient with the height of the form, for there is increasing opportunity 
of grading and regrading the scholars according to ability. The very 
low coefficient in Form III is to be expected owing to the difficulty of 
grading at the outset scholars drafted from different primary schools. 
The coefficients for the two large forms IV and V are indicative of the 
most general trend. 

A quick glimpse of this relation can be got by comparing the 
average left-handedness of the upper groups and the lower groups in 
each form (obtained by grouping together the averages for the form- 
divisions lying above and below the middle or odd division in each 
case). For example, in Form IV, divisions IVA to Modern IVA 
constitute the upper half and divisions IVD to Modern IVC the 

ower half. The results then are: 
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TaBLE V 





Upper half Lower half 





No. of pupils | Per Cent | No. of pupils | Per Cent 

















WIIG ys o's hee os deeee 107 5.0 89 5.4 
IS on os kcocbetee cee 112 2.9 110 6.8 
ais we vketanakers 121 6.0 116 8.0 
PTs id -b0cedendad sawed 16 4.6 21 6.0 





This relation between handedness and general intellectual and 
scholastic ability would appear to need little further support, but the 
figures of another short survey will be given. The two hundred 
sixty-seven pupils of the first-year form in the Kowhai Junior High 
School were asked this year (1932) to indicate whether they performed 
any of the operations of sawing, chopping, batting, bowling, taking 
food, sewing and cutting out with the left hand. Left-handedness 
at two such activities was regarded as significant and nineteen of the 
pupils fell into this class. The entrance examination marks of this 
form taken at the beginning of the year 1932. were then looked up and 
it was found that the average marks for the whole form were eight 
hundred ninety-nine and the average for the nineteen left-handed 
pupils was 831. As this brief survey is used merely for purposes of 
general corroboration, the rather considerable work of calculating 
errors of the averages and of the difference has not been undertaken. 


II. HANDEDNESS AND PSYCHOPATHIC TENDENCY 


The modified Woodworth-Mathews questionnaire described above 
was given to sixty pupils of average age twelve to thirteen years, 
comprising in the main the pupils of two classes of a school. The 
scores of this test were correlated with the scores for the degree of 
left-handedness. A coefficient of .28 + .075 was arrived at. That is, 
there is a definite, although not very high correlation between left- 
handedness and psychopathy, inasmuch as this may be measured by 
the questionnaire used. 

To determine how far this test does measure psychopathy, several 
teachers in different schools were asked to select two small, equal 
groups of children, the one nervous and the other not nervous, this 
distinction being made on the basis of their knowledge gained from 
general class observations. Comparisons were then made of the total 
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psychopathic score of the opposed groups. The results are given 
below, first the total scores of the several small groups and then the 
averages for all the cases. In groups A and B there were eight 
children in each of the nervous and non-nervous sections and in the 
remaining groups seven children in each section. This makes a total 
of one hundred two children tested, fifty-one nervous and fifty-one 
non-nervous. 








TaBLe VI 
Group Nervous | Non-nervous 

A 31 14 

B 29 16 

C 13 3 

D 26 9 

E 22 13 

F 33 13 

, G 19 23 

NN ciek oe keene seo eaees Wet tee a dte bdawaws 3.4 1.8 
Psychopathic answers, per cent.................... 21 11 











Thus in all cases but one there is an appreciable agreement between 
the teacher’s selection and psychopathic score. Obviously there 
must be considerable differences in the judgment passed by teachers 
on a matter like this and such a case as found in group G might well be 
expected. On the whole the questionnaire would seem definitely to 
measure certain psychopathic tendencies. 

The following table presents an analysis of these results, given in 
terms of the average psychopathic response for each of the items in 
the questionnaire, comparing the right-handed and left-handed people 
of the sixty children used as the basis of the correlation, and the 
“nervous” and “‘non-nervous” pupils of the one hundred two children 
used to validate the psychopathy test. 

Obviously the separate items in this test are not of equal value 
or do not seem to test the same general kind of situation. On the 
whole, however, there is close similarity in the trend. It is interesting 
to note that the average score for the right-handed group and for 
the non-nervous group is just half the amount of the score of the 
opposed group in each case. 

On the whole there appears to be a measure of correlation between 
left-handedness and psychopathy not to be ignored. 
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Tass VII 
: Right Left Non- 
Question hand hand nervous Mexvons 
1. Like play with other children..... .00 .08 .02 .00 
2. Allowed play with other children. . .02 .08 .00 .04 
3. Run away from home............ .04 .23 .04 .02 
4. People finding fault............. 13 31 .08 .22 
5. Sit without fidgeting............. 34 54 .14 .28 
i os ok ks ns kb ew macnic .00 .23 .02 .06 
7. Feel nobody likes you........... .19 .39 .16 .33 
8. Cross over small things.......... .49 .62 .14 31 
9. Feel nobody understands you..... 26 54 .22 .37 
10. Usually feel well and strong...... ll 15 .04 .04 
11. Feel rested in morning........... 13 .23 .06 .16 
12. Have bad dreams............... 21 .54 .18 .16 
13. Feel bored or “‘fed-up”.......... .32 .62 .16 41 
14. Find hard making up mind....... .30 31 .33 55 
15. Often get frightened............. 21 .39 .22 .37 
16. Tire of people easily............. ll 15 .02 .16 
Number in each group.............. 47 13 51 51 
GIT SEIU coco nc cnwnseccccava .18 .34 ll .22 

















III, HANDEDNESS AND HEIGHT 


This relation was studied from the data of the eight hundred eleven 
cases of the Auckland Grammar School. For this purpose the boys 
were reclassified according to age. The names of those boys who 
showed in their replies a definite left-handed tendency were supplied 
to the physical instruction department of the school which very kindly 
supplied their heights along with full data regarding the mean and 
distribution for each of the years thirteen to sixteen. The years 
below and above this range had too few cases for comparison. The 
average height for the left-handed group of each year of age was com- 
pared with the average for the school for those ages, an average derived 
from a large number of cases. In the following table are found the 
facts of the comparison, the number of left-handed boys in each age- 
group being indicated. 

Taken alone these differences between the norms for the age-levels 
and the averages for the left-handed groups mean nothing. The 
probable errors of the differences were calculated in each case and 
they all lie very closely round .3. With regard to this error, the 
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Taste VIII 
No. of left- | No. of left- 
Age, Boys sags satenan . hand cases | hand cases 
years mean, in. mean, in. 
below mean | above mean 
13 13 61 60 10 3 
14 19 63.5 62.5 10 9 
15 21 65 65.5 6 14 
16 21 67 66.5 12 7 




















differences of one inch in the 13-year and 14-year groups might be 
regarded as of some significance, although not perfectly satisfactory 
statistically. The differences for the 15-year and 16-year groups are 
not as reliable as these. Perhaps it is not unimportant that in three 
of the four groups the left-handed boys have lower averages. The 
changes in the figures for the different age-groups suggest at least a 
possible interesting fact in development. May there be in the 15-year 
group the effect of the adolescent spurt? 

On the whole there does appear to-be some likelihood that the 
left-handed boys at certain age levels are shorter than the average of 
their age. 


IV. HANDEDNESS AND TAPPING 


This comparison was made upon measurements obtained with the 
thirty-six boys of the Kowhai Junior High School referred to above. 
The tapping test was confined to speed, each test lasting for ten 
seconds, during which as many dots as possible were made on a piece 
of paper with a pencil. Three such tests were given for each hand, 
and from these the average tapping scores for right and left hand were 
obtained. 

On glancing through these scores it was found that the right hand 
scores were the higher in all cases but three. Of these three, two of 
the boys were very markedly left-handed and were the only members 
of the class found writing with the left hand. The third showed some 
tendency to left but not very much. For the purpose of correlation 
with handedness scores, it was decided to take in each case the higher 
score for tapping, that is, the right-hand score except in these three 
cases in which the left-hand score was taken. 

The coefficient of correlation stands at —.064 + .35. From this 
no conclusion can be drawn. 
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Examination of the right and left scores of the right-handed and 
left-handed boys suggested that those of the latter were more alike 
than those of the former. The obvious fact is that the left hand 
boys, through fuller use of both hands, are becoming somewhat 
ambidextrous. The average difference between the right and left 
scores for the right hand group is 12.1, and for the left hand group 
6.7. The mean variation of the difference for the whole class is 4.7. 
As a result of this approaching ambidexterity of one group, the sum 
of the right and left scores is higher for the left group than for the 
right. Hence a correlation between the total tapping score for both 
hands and the left-handedness score shows the result .22 + .1. 

The tendency toward ambidexterity of the left-hand group sug- 
gests that, if the contention of some people that handedness is an 
acquired characteristic is valid, the left-handed people who write 
with the right hand might be expected to show equally good or perhaps 
better right hand scores for tapping as compared with the right hand 
people. Of this group of thirty-six there are twenty who are domi- 
nantly right handed. Their average right hand score for tapping is 
60.6 + .87. The average right hand score of the thirteen with a 
left handed tendency who however still write with the right hand is 
58.6 + .69. This gives a difference of 2 + 1 in favor of the right 
handed group. It is true that this difference is subject to a rather 
large error, but the tendency of the results at least suggests some 
support for the hereditarian argument for handedness. . 


Vv. SUMMARY 


1. The data show a definite correlation between handedness and 
scholastic ability. Left-handedness seems to be associated with 
relatively low scholastic ability. 

2. With a group of children of average age twelve to thirteen years, 
the dominantly left-handed seem to be somewhat more psychopathic 
than the dominantly right-handed. 

3. Left-handed boys of thirteen and fourteen years seem to be 
somewhat shorter than the right-handed. There is a slight tendency 
to reversal at fifteen years with an equally slight tendency to physical 
dominance of the right-handed group at sixteen years. 

4. No clear relation was found between handedness and speed of 
tapping except that dextrosinistrals do not appear to be equal to 
dextrals on their right hand score. 
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VI. DISCUSSION 


The above findings support the view that handedness is in the 
main an inherited tendency. Such relations between it, scholastic 
ability and psychopathy would seem to have little meaning on any 
other view. 

Regarding the correlation found between handedness and psycho- 
pathic tendency, also indicated by other investigators (e.g. Cuff), 
it might be pointed out that the cause of much of the reported nervous- 
ness and stammering associated with the change-over from left-hand 
to right-hand activity, should perhaps be sought either directly or 
indirectly in the individual’s inheritance. 

In conclusion we would note that in view of the many possible 
complexities of development, the suggested correlations under height 
and psychopathy should be thoroughly checked up at several age- 
levels before any conclusion be drawn. 
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GRADE DIFFERENCES IN ATTITUDINAL REACTIONS 
OF SIX-YEAR SECONDARY SCHOOL PUPILS 


ORLIE M. CLEM 
Teachers College, Syracuse University 


AND 


MARCUS SMITH 
High School, Morristown, New Jersey 


“In a democratic society, ethical character becomes paramount 
among the objectives of the secondary school,” said a now famous 
committee. Fifteen years have passed, and little effort has been 
made to determine whether the secondary school is realizing this 
fundamental objective which the celebrated committee declared to be 
the one thing needful. 

If the cardinal objective of ethical character is being realized, 
twelfth grade pupils should exemplify improved behavior and attitudes 
toward Behavior, when compared with pupils in earlier grades. 
Ideally, bona fide conduct would be the optimum test rather than 
attitudes toward conduct; the former, however, is difficult to measure 
with any considerable number of pupils. The Tenth Yearbook! 
says, ‘Opinion measures seem to be the most readily constructed and 
safely interpreted of all character tests.’”’ To determine the relative 
attitudes of secondary school pupils of different grades toward some 
specific ‘‘moral situation” is the purpose of the study here summarized. 
The study measures in no way the objective behavior of pupils in 
specific situations, but provides a measure of pupil attitude toward 
behavior. The study does not trace consecutively attitudinal changes 
for individual pupils, but for grade groups. It does not isolate the 
influence of pupil elimination. No element of dogmatism or finality 
is claimed for the study. It represents one modest attack upon a 
baffling problem. 

The study was conducted in the Morristown High School. This 
school seemed well suited to the study, being a six-year secondary 
school, grades VII—XII, and having a cosmopolitan pupil population. 
At the close of the year 1930-1931 a questionnaire involving fifteen 
items was administered to eleven hundred seventy-two pupils. In 
each item of the questionnaire, a choice was offered among “moral 





1Tenth Yearbook, Department of Superintendence, National Education 
Association, 1930. 
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situations.” The questionnaire had previously been administered 
to some sixth grade pupils to guarantee that it was simple and clear 
enough for seventh grade pupils. Questionnaires were unsigned, and 
pupils knew their term marks had already been determined. 


Factor I.—Write the following in a list in order of badness: Extravagance, 
dancing, drinking, gambling, vulgar talk, playing cards on Sunday, conceit, 
gossip, smoking, stealing, lying, cheating, swearing, selfishness. 


In considering this factor, the item given first rank was assigned 
a numerical value of one; the second rank two;andsoon. The average 
ranking for each item for each grade was determined. 


TaBLe I.—AVERAGE RANKINGS FOR ITEMS IN Factor I 





























Grade 
Item 

VII VIII IX x XI XII 
See ee 2.29 2.04 1.76 1.91 2.02 
Gambling............. 3.38 3.58 3.33 4.30 4.00 5.16 
Drinking.............| 3.62 3.88 4.39 4.41 4.34 4.88 
Cheating.............| 5.38 §.27 4.42 3.71 4.23 3.83 
BGs tv sw's a oe ven ee 5.24 4.51 4.15 4.28 4.11 
Swearing.............| 5.65 6.29 5.72 5.79 6.64 6.66 
Vulgar talk...........| 6.54 5.78 6.44 6.02 5.70 5.44 
Cards on Sunday... ... 8.87 9.09 9.10 9.84 | 10.34 | 10.93 
Selfishness............| 9.47 9.84 8.55 8.86 8.82 7.87 
ND Shin oe Ob S845 9.55 | 10.14 9.04 8.73 9.09 8.13 
RS bis wads + 0s sive’ 9.69 9.85 | 10.22 | 10.17 | 10.73 | 10.78 
eee ee. ae 9.27 8.67 8.22 7.50 
Extravagance.........| 10.45 | 10.85 | 10.71 10.62 | 10.33 9.38 
Dancing..............| 12.15 | 12.72 | 12.49 | 12.41 12.65 | 12.97 





Table I shows that stealing is considered the worst of the fourteen 
items in every grade. Just the reverse is true for gambling. Senior 
division pupils consider gambling much less serious than do junior 
division pupils. The attitude toward drinking is progressively more 
tolerant from grade to grade. Cheating is considered more serious by 
senior than by junior division pupils. The data for lying are in gen- 
eral in accord with those for cheating. It is interesting that eleventh 
grade pupils are less scrupulous as to the “badness” of cheating and 
lying than are the grades immediately above and below. In general 
the attitude toward swearing is one of progressive toleration. The 
unusual toleration of the eighth grade is striking. The data for 
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vulgar talk show no steady trend. The two upper grades react most 
unfavorably. Playing cards on Sunday is considered progressively 
less ‘‘bad” until the twelfth grade is reached. Grades VII and VIII 
condemn selfishness far less than the upper grades, particularly grade 
XII. The data for conceit show no consistent grade tendency; pupils 
in the senior division condemn the trait more severely than do junior 
division pupils. As in the case of selfishness, twelfth grade pupils are 
comparatively more critical of conceit. There is a progressive attitude 
of toleration toward smoking. The attitude toward gossip is just the 
reverse; twelfth grade pupils are most severe. Reaction toward 
extravagance shows slight variation from grade to grade, except for 
grade XII, in which pupils are most sensitive to this item. Dancing 
in every grade is considered the least ‘‘bad” of all the traits. There is 
a slight tendency toward even increasing liberality in later grades. 
Evidently, all pupils see little evil in dancing. 


TaBLE Ia.—SIGNIFICANCE OF DIFFERENCES BETWEEN TERMINAL GRADES 























Average | Average D Chances 

ranking, | ranking, | Difference | ——— in one 

grade VII | grade XII SDD | hundred 
dct peiid-nowiw war 2.47 2.02 45 2.05 98 
cas enesxebes 3.38 5.16 1.78 6.30 | 100 
SN ei vse wactak 5 antns 3.62 4.88 1.26 3.57 | 100 
CR Shbskes Knee sece 5.38 3.83 1.55 5.86 | 100 
a Gite os dead 35 aces 5.62 4.11 1.51 5.06 | 100 
eb TER ee Rea 5.65 6.66 1.01 3.95 | 100 
.. (EN RR 6.54 5.44 1.10 3.79 | 100 
Cards on Sunday........ } 8.87 10.93 2.06 6.02 | 100 
re 9.47 7.87 1.60 5.06 | 100 
Sei Si iiip ed cd's Sere 9.55 8.13 1.42 4.73 | 100 
WINES, a kw cos evacun 9.69 10.78 1.09 3.07 | 100 
ee a 9.98 7.50 2.48 7.84 | 100 
Extravagance............ 10.45 9.38 1.07 2.82 99.75 
RO ss ecad stkeodk Re 12.97 .82 2.76 99.72 





Table Ia shows the statistical significance! of differences between 
items in Table I. The terminal grades VII and XII are used. A 
quotient of three or more is considered to indicate a significant differ- 





D 
The formulas used were: Significance = SDD 


SDD = »/(SDM,)? + (SDM;)*. 
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ence. Table Ia shows that for eleven of the fourteen items, a sig- 
nificant difference exists; for the other three items, the probability of 
a significant difference is ninety-eight chances in a hundred or better. 
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TasBLEe [b.—Ranxk OrpER PosiTIon or ITEMs IN TABLE I 




















Grades 

VII | VIII; Ix x XI | XII 
ERR a rs SPP Sey | 1 1 1 1 1 
Ss i itis hain dire natbnke i smsk: | ae 2 2 4 2 5 
EEN SS Seer 3 3 5 5 4 
IL s\a 4 6.4 Sica Ga bao > bd bo 08 4 5 4 2 3 2 
eS O's dong Actas SRG wo wee oe 5 4 5 3 4 3 
I espns be ede Rab se bs sade Oe 6 7 6 6 7 7 
GR eins bs 5 5 Heke sie so bn we wade 7 6 7 7 6 6 
NN eee 8 8 10 11 12 13 
eS oleae ke aap eens 9 9 8 10 9 a) 
iin ee Ca 12 9 9 10 10 
es eg cc a ceeds be one te Tae 10 12 12 13 12 
Ee pe heekeaxechaeneh ae 11 11 8 8 8 
Ds cccccebessaevessactl ae 13 13 13 11 11 
ES a em 14 14 14 14 14 














Table Ib presents the data of Table I in different form. Table 
Ib shows the rank order of each item by grade, and is helpful in 
clarifying and supplementing Table I. 


Factor II.—If you are given too much change and do not discover it until you 


have left the store, you should (1) take it back (2) keep it (3) give it to charity. 


Tasie I].—Maxina CHANGE 
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Table II shows little significant variation from grade to grade. 
The range for Item 1 is from 93.4 in grade XII, to 97.5 in grade X. 


Pupils in all grades almost unanimously agree that change received 
by mistake should be returned. 


Factor III.—If you know your neighbor to be breaking the law, you should (1) 
do nothing about it (2) tell all your friends (3) take some measure to stop him. 


TABLE III.—LAw-BREAKING 




















SM aaupuke sere netees VII VIll IX 
a ES SN a A 2 3 1 2 3 1 2 3 
, 0 eee re 4 |161 2 0 /|160 7 0 (244 
Percentage.............| 3.0} 2.3) 94.7; 1.3) O | 98.7; 2.8 O | 97.2 
We cit ecss sewn undede x XI XII 
ES a 2 3 1 2 3 1 2 3 
tiie kin bhdy nicked ae 1 |194 | 13 1 (206 | 14 0 {153 
Percentage.............| 4.0) 0.5) 96.5) 5.9) 0.5) 93.6) 8.4| O | 91.6 














Table III shows that over 90 per cent of pupils in all grades favor 
taking some action to stop a law-breaker. A small but increasing 
minority favors the policy of ignoring their neighbor’s law-breaking. 
In the upper five grades to ‘‘tell all your friends” practically dis- 
appears. These data support Factor I showing a greater distaste for 
gossip in the later grades. 


Factor IV.—If you are in business, it is (1) excusable (2) perfectly all right (3) 
wrong, to exaggerate the quality of your goods. 


TaBLeE [V.—Busingess Etuics 



































Genk sas cutee jac ce VII VIII IX 
ee mee 2 3 1 2 3 1 2 3 
i en © Be at 8 | 41 |113 | 22 | 38 /|189 
Percentage.............| 7.2] 25.3) 67.5| 4.9] 25.3) 69.8] 8.8| 15.2| 76.0 
| EET eee x XI XII 
a ee 2 3 1 2 3 1 2 3 
Total..................1 29 | 28 [146 | 50 | 28 [142 | 40 8 /|119 
Re 14.5| 14.0) 71.5) 22.7) 12.6) 64.7) 24.0) 4.8) 71.2 
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Table IV shows that the percentages for Item 3, “wrong to exag- 
gerate the quality of your goods,” fluctuate around 70. The range 
is from 67.5 in grade VII, to 76in grade IX. No consistent differences 
exist between any groups of grades. It is noteworthy that approxi- 
mately 30 per cent of pupils in every grade do not consider it wrong 
to exaggerate in business. Of this group 7.2 per cent consider exag- 
geration excusable and 25.3 per cent all right, in grade VII; in the 
twelfth grade, 24 per cent consider exaggeration excusable and 4.8 per 
cent all right. It appears evident that in terms of business ethics, 
twelfth grade pupils are influenced more by expediency than are 
seventh grade pupils. 

Factor V.—If a sign says the speed limit is twenty miles an hour and there is 


no policeman in sight, you should (1) drive at twenty miles an hour (2) drive as 
fast as you can (3) drive at any speed that seems safe under the circumstances. 


TaBLE V.—TRaFFic COURTESY 




















Rig coi Wa bub aeeda.ae VII VIII Ix 

i phate uo 5d sd ng A 2 3 1 2 3 1 si 8 
ee 1 | 40 /141 2 | 19 /|175 0 | 76 
Percentage. ............ 75.9) 0.6) 23.5) 87.0) 1.3) 11.7) 69.6) O | 30.4 
LE veto gh . waab 5a ion x XI XII 

Srna ie Se ae 3 1 2 3 1 2 3 
i ees 0 | 82 |126 2 | 92 | 99 1 | 67 
Percentage.............| 59.9} O | 40.1) 60.7) 0.9) 38.4) 59.2) 0.6) 40.2 

















Table V reveals a decided shift in attitude during the secondary 
school period. The majority in all groups check Item 1, but approxi- 
mately 40 per cent in the three upper grades consider that the driver 
should exercise his own judgment. It might be of interest to legisla- 
tors to know that a solid block of pupils, two-fifths of the senior high 
school group, recognize no obligation to obey posted speed limits, but 
prefer to use their own discretion. Upper grade pupils tend to con- 
sider the spirit rather than the letter of the law. 


Factor VI.—If a man has been in prison but is trying to go straight, you should 
(1) let him alone (2) warn others of his record (3) help him to get a fresh start. 


Table VI indicates a rather consistent growth in social humani- 
tarian outlook through grades VII-XII. In grade VII, 87.1 per cent 
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grades overwhelmingly check the most humanitarian choice. 


Factor VII.—In business, the best way to fix your prices is (1) to charge as 
much as you can get (2) to ask prices that will give you a fair profit (3) to ask 
prices that will be lower than your rivals’ prices. 


TaBLE VII.—Business Eruics 
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TaBLeE VI.—HvuMANITARIANISM 

Qe en eect cnceeees VII VIII IX 
eS as dace kta oe 2 3 1 2 3 1 2 3 
> Sener eer i 6 3 |153 9 QO |242 
Percentage.............| 7.1} 5.8] 87.1) 3.7] 1.8) 94.5) 3.6) O | 96.4 
GK bh bs ise sentitvcees xX XI XII 
Pes site ouch bs 09 web epale oe 2 3 1 2 3 1 2 3 
pO Ee 1 |200 4 3 |213 0 0 |167 
Percentage.............| 1.0) 0.5) 98.5) 1.8) 1.4) 96.8) O 0 |100.0 
check Item 3, as contrasted with 100 per cent in grade XII. Aill 





























I coins che eNS Oe Gadd VII VIII IX 
rietei ticked bus 1 2 3 1 2 3 1 2 3 
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Table VII indicates that approximately 90 per cent of pupils in all 
grades check Item 2. Ten percent of seventh grade pupils favor under- 
selling. This percentage gradually decreases until there is a sudden 
drop from 7.7 per cent in grade XI to 3.6 per cent in grade XII. It is 
possible that a twelfth grade economics course may have influenced 


the change. 


, “To charge as much as you can get” was rather con- 
sistently ignored throughout all grades. 


Factor VIII.—Swearing is permissible (1) when there is no one present who will 
be offended (2) never (3) when you are very angry. 
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TasBLe VIII.—SweEarine 
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Table VIII shows that almost 90 per cent of pupils check Item 2, 
indicating that they consider swearing inherently wrong. It is note- 
worthy that 4.8 per cent in grade VII and 20.4 per cent in grade XII 
check Item 1, showing that they consider swearing a matter not of 
inherent wrong but of social courtesy. Considerable fluctuation 
exists on Item 3, with greater emphasis at the beginning and end of the 
secondary school. The data for the last two years are in accord with 
Factor I, showing profanity to be considered less serious by upper 
grade pupils. 

Factor 1X .—It is excusable to tell a lie (1) never (2) when it will benefit someone 
(3) when it can’t be found out. 


TaBLe [X.—LyInG 
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Table IX shows a decreasing tendency in the secondary school 
to regard lying an absolute moral wrong. Item 1 receives 85.3 per 
cent of votes in the seventh grade, and 62.8 per cent of votes in the 
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twelfth grade. Item 2 shows a corresponding rise in per cent, while 
Item 3 has little support throughout the six grades. Factor IX 
appears to illustrate early indoctrination yielding to social pressure 
and experience. Practically all children are taught that lying is always 


wrong, but during this six-year period they tend toward a less austere 
point of view. 


Factor X.—It is all right to cheat in an examination (1) if the whole class does 
it (2) never (3) when it saves you from failing. 


TABLE X.—CHEATING 
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Table X shows no significant trend in any group of grades. Item 


2 ranges 


from 92.9 per cent in grade VII to 98.5 per cent in grade X. 


Almost unanimous opinion agrees that cheating in an examination is 


wrong. 


Item 1 receives more support among lower grade pupils, 


indicating a greater tendency to justify their action in terms of the 


group. 


Factor XI.—Drinking is (1) never right (2) all right at any time (3) all right 
once in a while. 


Table XI shows a definite change in the attitude of pupils during 
the secondary period. Item 1 shows a consistent decrease for the 


six-year 


period. Factor XI is very interesting in terms of recent 


liquor agitation. It seems reasonable to assume that Item 1 repre- 


sents the 


“bone-dry”’ point of view; Item 2 the “‘wringing wet” point 
po 


of view; Item 3 the moderate point of view. With this interpretation, 
more than two-thirds of the pupils in all grades are ‘‘bone-dry”; the 
moderate point of view shows a consistent increase through the 
secondary school; the ‘wringing wets” are only a small percentage 


throughout. From these data, it is evident that the increasing 
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TaBLEeE XI.—DrINKING 
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Factor XII.—It is excusable to break a law (1) never (2) when a great many 
people are disregarding it (3) when you can be sure of escaping punishment. 


TABLE XII.—LAW-BREAKING 
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moderate point of view is recruited from the “‘bone-dry”’ rather than 
from the “wringing wet” wing. Factor XI represents no doubt 
another example of early indoctrination yielding to later social forces. 

Table XII reveals no significant tendency through the various 
grades. It is noteworthy, however, that a minority in each grade 
believes that laws may sometimes be broken “‘excusably.” The size 
of this group is 11.4 per cent in the senior year. 


Factor XIII.—True courtesy comes from (1) good home training (2) con- 
sideration for others (3) desire to make a good impression. 


Table XIII shows a consistent shift in attitude relative to the basis 
of courtesy. A consistent decrease in percentage for Item 1 from 85.9 
per cent in grade VII to 58.7 per cent in grade XII shows that later 
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TasiLe XIII.—Covurtesy 


307 
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grades are more inclined to distinguish between the form and the 
spirit. In grade VII, only 10 per cent of pupils regard “‘consideration 
for others” as the true basis of courtesy; in grade XII, the percentage 
is 39.5. The fact, however, that three-fifths of pupils in the senior 
year consider home training to be the true basis of courtesy shows the 
tremendous influence of early indoctrination. 


Factor XIV.—A proof of good character is (1) going to church regularly (2) 
giving money to the poor (3) treating others fairly. 


TaBLE XIV.—CHARACTER 
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Table XIV shows a consistent improvement in selecting the most 
important element in conduct, as pupils progress through the secondary 
school. Item 3 increases from 74 per cent in grade VII, to 98.8 per cent 
in grade XII. This factor illustrates another indoctrinated attitude 


yielding before the pupil’s own experience. 


Seventeen per cent of 


pupils check Item 1 in the seventh grade; but no grade above the 


eighth gives this item a higher per cent than 3.5. 
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Factor X V.—If you are beaten in a game, you should (1) explain why you were 
beaten (2) play again, and try to win (3) not play again with the same person. 


TABLE X V.—SPoORTSMANSHIP 
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Table XV shows that pupils overwhelmingly accept Item 2, in 
all grades. ‘The range is from 86.5 per cent in grade VII to 98.6 per 
cent in grade XI. Pupils in the seventh and eighth grades are more 
inclined to ‘‘explain’” away their defeats. The secondary school 
period appears to develop good sportsmanship in that opinion is 
crystallized around Item 2. 


CONCLUSIONS 


1. The attitude of pupils toward such personal habits as swearing, 
drinking, gambling, and playing cards on Sunday, becomes more 
tolerant in succeeding grades of the six-year secondary school. In 
general, the reverse is true for cheating, lying, conceit, vulgarity, 
selfishness, gossip, and extravagance. 

2. In terms of ‘‘badness,’’ stealing is uniformly considered by all 
grades the worst of all items studied, and dancing the least ‘‘bad.”’ 

3. It is evident throughout the study that lower grade pupils 
are more inclined to make choices on the basis of indoctrination, than 
are upper grade pupils. Upper grade pupils exhibit better social 
discrimination and judgment, and less ingrained respect for law. 

4. In terms of law observance, upper grade pupils are more 
inclined to substitute personal judgment for blind obedience: the spirit 
for the letter of the law. 

5. As revealed by the data of the study, pupils in succeeding grades 
of the six-year secondary school exhibit increased humanitarianism. 








vere 


ring, 
nore 
In 


rity, 
y all 
upils 
than 


ocial 


more 
spirit 


rades 





Grade Differences in Attitudinal Reactions 309 


6. Standards of judging business conduct appear relatively high. 
Exaggeration in business for the purpose of expediency is condoned to 
a greater degree by upper grades. 

7. The succeeding grades of the six-year secondary school reveal 
increasingly higher standards of sportsmanship. 

8. In terms of the interpretations of this study, more than two- 
thirds of the pupils in all grades are “bone-dry.” The percentage 
decreases consistently through the succeeding grades of the secondary 
school. 





~ = 






































THE FACTOR THEORY AND ITS TROUBLES. 
V. ADEQUACY OF PROOF 


| C. SPEARMAN 
1. “must” or “‘mMay?” 


hee . 
1 | { Summarizing the previous articles of this series, it may be recalled 
that the first three dealt with the troubles that had arisen for the 
theory of Two Factors from statements that this theory had been 
ill-supported by actual observation. Such statements were shown in 
the said articles to be devoid of foundation. Proof was easily given 
that they mainly derived from a gross misrepresentation of the theory, 
and that in truth this and observation had throughout agreed. In 
the fourth article of the series we considered a criticism of different 
nature. This, far from denying that the factors had been supported 
by observation, admitted this support to have been excellent. On 
the other hand, however, it sought to impeach these factors themselves, 
and especially g, of failing to be constant or “‘unique.” In its extreme 
form it gave rise to such assertions as the following | 
When a g is found it may or may not be the g which another found yesterday. 


The individual has as many g’s as you administer tests. It is therefore not a basic 
constant of the individual. 


This objection had been entirely based upon mathematical considera- 
tions. Accordingly, it was in the said article shown by mathematics 
to be invalid. The g when properly determined remains constant. 

The trouble we have to consider in the present article does not 
concern, either the verification, or the constancy of the factors. It is 
not so much a matter of either observation or of mathematics, as 
rather of logic. It seeks to show that, however good the verification 
and perfect the constancy of the factors, still the whole line of proof 
is basally inadequate. 

One of the most notable instances of such an objection has been as 
follows. The criterion of vanishing tetrads, it has been said, does not 
prove that the variables (ability scores or otherwise) must be divided 
into two factors g and s, but only that they may be so divided. Hence, 
it is argued, the criterion does not show the theory of two factors to be 
necessarily true, but only to be possibly so. This objection, besides 
filling the pages of some inferior writers, has even found expression with 
authors whom we admit to be competent and fair-minded. , 

Nevertheless, against this objection may be urged decisively that, 
even if it were really valid, the theory of two factors would still stand 
310 
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on the same footing as the great majority of scientific theories, includ- 
ing even those of physics. In general, these theories are not strictly 
speaking proved, but merely verified; the observations are only shown 
to be such as the theory would lead us to expect. For example, no 
one has ever really proved that matter is made up of molecules; the 
procurable evidence has not gone beyond such observations as that 
two substances sometimes seem to occupy one and the same place. 
This is just what should seem to happen 7f the molecular theory were 
true. But at any moment a new and less accommodating observation 
may come along which puts the theory into mortal danger. 

Such a downfall was the well-known fate of the long-accepted 
Newtonian corpuscular theory of light. This, after being success- 
fully verified by observations in immense number and variety, at 
last broke down upon some new ones (those of interference) and 
thereupon had to yield place to the rival theory of waves. Later on, 
this in its turn was confronted by further incompatible observations, 
and had itself to give way to a new form of corpuscular theory. Such 
evidence liable to be eventually upset, may be described as being only 
of the “‘lower degree.” 

Now, in this respect the theory of Two Factors has been excep- 
tionally fortunate. Consider the two propositions: 

(A) The divisibility into g and s proves the zero tetrads; (B) the 
zero tetrads proves the divisibility into g and s. Not only has A been 
demonstrated, but also B; and the proof has been just as rigorous in 
the latter case as in the former. Accordingly, so far as this part 
of the theory goes, it can never possibly be upset by any future 
observations of any sort. Wheresoever and whensoever the tetrads 
vanish, there and then—come what may—the divisibility into g and 
8 must remain unshaken and unshakeable. We have here what may 
be fairly called evidence of the higher degree. 

It is, then, definitely untrue to say, as some authors have done, 
that this reverse proof B only shows that the division “leads to no 
contradiction of the observed coefficients.” This much would be 
supplied by the lower degree of evidence, whereas in B we have the 
higher ciegree. 

But if this evidence is really conclusive, how did the said critics 
ever come to declare that it is not so? By getting confused, it would 
seem, between the meanings of “‘may” and ‘‘must,” as also those of 
“divided” and ‘‘divisible.” To clear up the whole matter of division 
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let us take that of a quadrilateral figure into two triangles. We 
get three leading cases. 

I. In this case we suppose the observation to have somehow been 
made that the opposite sides of a given quadrilateral are parallel. 
And from this observation we suppose the theorem to be deduced, that 
the figure” may be made up of, and therefore (ideally) divisible into, 
two right triangles. The ‘‘may”’ seems in this case justifiable, in the 
sense that such a make-up will occur under some conditions (as when 
the figure happens to have at Jeast one of its angles right) but not under 
other conditions. Thus here the observation of parallelism supports, 
but does not finally prove, the alleged make-up; it is consistent there- 
with. The evidence is what we have called that of the lower degree. 

II. In this second case there has again been made the observation 
of parallelism. But this time the deduced theorem is that the figure 
must be made up of and divisible into two equal triangles. The 
‘‘must” seems here to properly indicate that such a make-up occurs 
always and everywhere. Thus the observation is not only consistent 
with the make-up but rigorously proves it. The evidence is of the 
sort that we have called that of the higher degree. 

III. In this case the observation has been fuller. It is supposed 
to have shown, not only that the opposite sides are parallel, but also 
that (at least) one angle is right. Here, the theorem can be correctly 
deduced that, if the figure is divided into triangles at all, zt must be 
divided into two right angles. 

Now let us compare these three geometrical cases with our psy- 
chological one where the observation of hierarchy is taken to prove 
divisibility into gands. The scope of the psychological proof is indeed 
less than that in case III. But it is more than that in case I and the 
same as that in case II. Thus, as already said, it is more than is 
attainable in the great majority of scientific theorems, even physical. 


2. PROOF OF EXISTENCE 


Much rarer, but still announced with so much emphasis that it 
cannot well be overlooked, has been the objection that the two factors, 
g and s may after all, have no “existence.” Something that sounds 
rather like this came even from the pen of Mackie, when he said: 
the entities represented by g and s may not actually exist. (This Journal, 1928, 
p. 614.) 

But this statement, in itself, is irreproachable. If, for instance, we 
take the ‘‘entities” measured by g and s to consist in psycho-physical 
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energy, then certainly its actual existence is at least open to reasonable 
doubt. Moreover, Mackie, with a fairness and insight characteristic 
of him, at once proceeds to admit that such a doubt has always been 


recognized even by the proponents of the two factor theory. Mackie 
also concedes that 


in this respect the theory is in no worse case that many accepted theories in the 
domain of physical science. 


Here again, what he says cannot be disputed; ‘weight,’ ‘mass,’ 
“force,”’ “density,” “‘gravity” and so forth are indubitably entities 
of a more or less fictitious character. 

But what was so unimpeachable in the writings of Mackie is apt 
to degenerate in less capable hands. With at least one writer we find 
the non-existence attributed, no longer to what ‘is represented by”’ 


or “underlies” the magnitudes g and s, but to these magnitudes 
themselves! 


We are told (with extraordinary re-iteration) that 


the vanishment of the tetrads constitutes in no sense proof of the existence of g and 
s. (Tryon, Psych. Rev., Vol. XX XIX, 1932.) 

To assert thus that the proof does not amount to that of “existence”’ 
is to misconceive the whole scope of science, and even of logic. All 
proofs are essentially affirmation or negations of existence. To 
affirm that any premises lead to any conclusion—as is done for instance 
in our A and B—is equivalent to affirming that if and when the 


situation indicated by the premises ‘‘exists’” so also must ‘‘exist’”’ 
that indicated by the conclusion. 


3. CONFLICT OF IDEAL DIVISIONS 


Let us pass on to another difficulty which has found expression in 
writings of greater number and higher standard. This time the 
trouble has lain in the assumption—none the less effective for being 
tacit—that the divisions of a variable (such as an ability score) 
into one set of factor precludes it division into another set. 

Thus, one author of repute writes that variables 


must not necessarily be divided according to the two factor theory, for! other modes 
of division are possible. 


Another writer in similar strain calls the two factor division 


only one possibility among many others. 





1The italics are our own. 
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At this point, we are obliged to make a fundamental distinction 
between divisions of the two kinds that are called respectively real and 
ideal. The former term means that the items are separated in space 
or time. The latter, that they are only separated as objects of 
thought. 

Now only the ideal kind of division is involved in what has been 
called the ‘‘general’” theory of two factors; and as a rule the ideal 
divisions do not at all interfere with each other. Thus, if we return 
to our previous illustration, we see at once that dividing a quadrilateral 
ideally into two triangles for the purpose of one argument does not 
prevent us from also dividing it ideally into two quadrilaterals for 
another purpose. And if the two ways of division are both proved to 
be possible, then they cannot be inconsistent! They can only be 
different aspects of one and the same situation. The question as to 
whether one way of division, or the other, or both, or neither should 
be adopted by science has to be decided solely on grounds of scientific 
utility. Each way has to justify itself. 


4. CONFLICT OF REAL DIVISIONS 


So soon, however, as we turn from the ideal division of the abstract 
magnitudes to the real division in space or time of the entities taken to 
underlie these magnitudes, then matters become different. We are 
no longer in the sphere of the general theory, which deals solely with 
magnitudes. We pass into what has been called the sub-theories, 
by which these magnitudes are “‘interpreted.”! Hereupon, no doubt, 
the possibility of conflict between different ways of division does no 
doubt very greatly increase. We cannot fashion a tree into a mast 
and at the same time cut it up into matches. 

An instance of more than historical interest is where each magni- 
tude to be divided consists of the total score got by throwing a set of 
dice. Ostensibly, the dice are merely used as a convenient method of 
obtaining a set of variable magnitudes having known degrees of 
correlation. But implicitly, the concrete nature of the dice—notably 
their incapacity for breaking up into separate parts but their separa- 
bility from one another and their scarcely less ready assemblage into 
groups—all these characters are also more or less surreptiously intro- 
duced into the situation. In particular they have been made to play 
a dominant part in criticising the general theory of two factors. 





1See the present writers, Manifold Sub-theories of the Two Factors. Psych. 
Rev., Vol. X XVII, 1920. 
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As regards at any rate the general theory, such introduction of 
real characters is, as just said, quite illegitimate. Moreover, even as 
regards interpretative sub-theories, the conflict. goes less deep than it 
seems to do. Thus, two apparently most unlike ways of dividing 
up the entities that underlie an ability score consist in taking these 
entities to be respectively, (a) one single psycho-physical energy and a 
multitude of engines, and (b) no common factor at all but only a large 
number of independent genes. Yet in truth even these two ways of 
dividing fit together wonderfully well. For what could be more 
reasonable than to suggest that all the genes conjointly determine the 
magnitude of the energy as also the efficiency of the engines? If 
this be so, then, in this case, our g measures both the amount of energy 
and also the efficiency of the genes. On the whole, we may say that 
even in the case of real things the lines along which scores or other 
variables may be simultaneously divided are indefinitely numerous. 
The factorization of these variables has to be prepared to follow 
any or all of these lines at the same time. 

To conclude this section, we may cite some criticisms, which are 
essentially similar to those preceding, but are phrased more elusively. 

In one case the author after saying, (without evidence) that 
abilities can be divided into factors other than g and s, infers that 
therefore g and s “are not needed.” Against this inference, we 
must protest as before that the final arbitrator is scientific utility. 
We must urge that if the g and s should turn out to render any scien- 
tific service not done better by other means, then psychology does 
“need” them, regardless of any other divisions that may be possible, 
and even upon occasion, useful. Another example of these evasive 
sayings is the statement that the criterion of zero tetrads, ‘“‘no more 
proves the abilities to be divided into g and s than it proves them to be 
divided in various other ways” (left unspecified). This is, frankly 
speaking, ridiculous. The criterion has been shown to prove that 
the score must be divisible into g and s; it has not been shown to 
prove that the score must be divisible “in various other ways.” 


5. RIVAL THEORIES OF DIVISION 


After all these remarks on the general compatibility of the divisions 
into g and s with any modes of division proposed by other theories, 
let us here briefly inquire what these other proposals have so far 
actually been. 

In literature a prominent place used to be accorded to the ‘‘sam- 
pling theory” of Thomson. This, as we have already seen, certainly 
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does not conflict with the theory of two factors. It does, however, 
suffer wreck on its own account. As has been said about it. 


the rocks that have sunk it are its equivocality in itself and its disagreement from 
the facts admitted universally.! ‘ 


To the detailed demonstration of its failure Thomson appears to have 
found no reply. 

Let us turn to the work of Thorndike. Here is an author who has 
frequently, and even recently, been represented as the principal 
opponent of the two factor theory. But on proceeding to examine his 
own recent writings we find such statements as the following 


amounts of intellect . . . are amounts of some unified co-herent factor in nature 
which can be properly isolated from other non-intellectual factors.? 


If these “amounts of intellect’’ differ at all from our g, I for my part, 
have yet to see how. Even as regards the sub-theories, since I am 
now enjoying the privilege of collaborating with this distinguished 
authority, I have every confidence that we shall arrive at views 
substantially the same. 

Another outstanding name in this field is that of Hull. As regards 
the general theory he frankly accepts g. But he attaches to it the 
sub-theory that it may perhaps be only 


a kind of mathematical expression of the totality of all the group factors. 
These “group factors” he takes to constitute 
the actual physical determiners of aptitudes and test abilities. 


But up to the present, to the best of my knowledge, he has not 
begun to support these factors statistically, or even to name them, 
much less to indicate their physical nature. We shall all await his 
results with great interest. 

One more author who cannot be omitted here is Truman Kelley. 
The relation of his findings to those of the present writer presents one 
strange feature. He, himself, states explicitly enough that 
on the whole the two sets of findings are quite remarkable in harmony, the agree- 
ment being in the matter of spatial, numerical, and even general factors.’ 
(Page 23.) 

And yet there is something about the personal tone of book, which 
has conveyed to most of his readers the impression that, on the con- 


1 Brit. J. Ed. Psychology, Vol. I, 1931, p. 20. 


2 The Measurement of Intelligence, 1926, p. 63. 
* Crossroads in the Mind of Man, p. 23. 





tra 
hac 
line 
an 

me 
bel: 
spe 
cor 
err 
ren 
em 
tio} 


agr 
pro 
For 
whi 
thr 
esse 
wa) 
tak 
hac 
bas 


Kel 
Spe: 


Kell 
Spe: 


of t! 
Fac 
deta 
Lon 








ot 


18 


ne 


e- 


The Factor Theory and Its Troubles 317 


trary, he and I are in fundamental disagreement! Recently I have 
had occasion to see his experimental results worked out along the 
lines customary with the supporters of the two factor theory. Hereby 
an interesting opportunity was afforded of ascertaining how far this 
method really did agree with ours. The general results are shown 
below.!. Not only did we agree in finding the two factors, general and 
specific, but even (with one exception) in determining the amount of 
correlation due to these respectively; the agreement is as good as the 
errors involved should lead us to expect. And this fact is the more 
remarkable in view of the nature of the system of tests which he 
employed; this was very peculiar and certainly did admit of factoriza- 
tion in extremely diverse ways. 

The fact that in spite of these possibilities of divergence we actually 
agree so well, would seem to indicate that in essence our methods of 
procedure were really the same. And such was actually the case. 
For apart from his final adjustments by the method of least squares— 
which make only minor (inacceptable) changes—he based his work 
throughout on tetrad differences. And these, of course, are the very 
essence of the theory of two factors. Moreover, even the particular 
way in which the tetrad differences were applied by him (which was by 
taking the mean of all that are involved in one particular correlation) 
had already long been current with us. Both Davey and Hargreaves 
based on it their respective researches made public in 1926.? 





. I. CoRRELATIONS OF g witH Eacu TEsT 








Tests 1 2 3 4 5 6 7 8 9 
Kelley method..........| .40 | .38 | .23 | .21 | .66 |] .69 | .59 | .39 | .65 
Spearman method....... .36 | .81 | .12/] .88 | .68 | .76 | .74]| .70) .72 
































Correlations between upper and lower lines of values is about .7: 


II. SrignrF1cant OVERLAP 





Tests 1 and 2 Tests 3 and 4 
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* The usage of these mean tetrads is only indicated cursorily in the publications 
of these authors (Davey, Brit. J. Psych., Vol. XVII, 1926. Hargreaves: ‘‘The 
Faculty of Imagination,” Cambridge Univ. Press, 1927). But it is given with all 


a in their Theses which are to be seen in the Library of the University of 
ndon. 
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Similar is the lesson from the work just published by G. M. Smith, 
Jr., “under the general direction” of Garrett. Smith too employs 
the method of tetrad differences. But in addition he uses a new 
method due to Hotelling (see later on). The tetrad method must 
of course produce the familiar g and the group factors associated 
therewith. The Hotelling method also produces a general factor 
associated with group factors. Now, as regards the tetrad method 
Smith uses this to determine the group factors but not the g (the 
latter he scarcely mentions). On the other hand, as regards the 
Hotelling method, he only employs this to determine the general factor, 
hardly alluding to the associated group factors. And so he remains 
sitting between two stools; he fails to reach any complete consistent 
analysis of his data. 

The present writer has supplied the missing correlations of the tests 
with the general factor got from the tetrad method, and has compared 
these correlations with the corresponding values obtained by Smith 
from the method of Hotelling. The latter set of values proves to be 
throughout somewhat larger, a fact which agrees with the finding of 
Holzinger that they are too large. But the correlation between the 
two sets of values, despite their extremely small scatters, amounts 
to no less than .9. So far as the evidence goes, then, the two general 
factors turn out on closer examination to be substantially one and the 
same. 


6. NEW FACTOR SCHEMES 


On the whole, however, new procedures of factorization have 
undoubtedly made their appearance, and this is the pleasing result 
of the mathematics of factorization being studied in a much more 
general and fundamental manner than heretofore. In particular, 
increasingly vigorous attempts are being made to deal in a more 
direct manner with the cases where correlations are not hierarchical. 
Previously (as we showed before) such cases could only be approached 
indirectly through the medium of the hierarchical ones. 

Not long ago, the hopes entertained in this new direction were 
extremely high. The time had come, it was thought, when any table 
of correlations, hierarchical or otherwise, had only to be submitted 
to the mathematician, and he, by dint of elaborate formulae used 
quite mechanically would be able to calculate the total system of 
parent factors from which the table must have sprung. 





1 Group Factors in Mental Tests. Archives of Psychology, No. 156, 1933. 
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But such wonderful prospects were shattered by Holzinger and 
Swineford. These authors showed that—save in very exceptional 
cases, the system of parent factors could not be rigorously inferred. 
On the contrary such a table they proved to be compatible with an 
infinitely large number of systems. 

Chastened by the disappointment, the workers in this field became 
less ambitious. Instead of the vain efforts to ascertain the one and 
only possible system of factors, their aim now was to ascertain what 
system could with the smallest number of factors best satisfy some 
more or less plausible condition. So it was that Thurstone tried to 
determine the fewest factors that would best fit the observed correla- 
tions. Hotelling sought, rather, the fewest factors that would make 
the largest possible contributions to the observed variances. 

What mead of success is to be obtained along these directions has 
still to be finally settled. But considering who are the authors at work 
and what they have done already, there can be little doubt but that 
the eventual achievement will be great. 

Still there is at present no reason to suppose that in the sphere of 
abilities any successes along these lines will enter into conflict with, 
or supersede, the older analysis into g and s. Where they do suffer a 
prospect, not indeed of conflict with g and s, but of largely supple- 
menting these, is in the sphere of character traits. 


7. SUMMARY 


In the preceding article, we have been considering a variety of 
objections raised to the theory of two factors on the ground that, 
however much they may be corroborated by observation and unique 
in themselves, still their general lines of proof have been logically 
inadequate. Typical has been the argument that satisfaction of the 
criterion of hierarchy 


no more proves the score to be divided into g and s than it proves it to be divided 
im various other ways. 


Similarly, the division into g and s is said “not to be needed.” 

But all such arguments have upon closer examination shown 
themselves to proceed from unclear notions, and particularly from an 
inadequate grasp of the general principles of division—indeed, the 
principles of all empirical science. 





‘This Journal, Vol. XXIII, 1932, p. 247. 
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